Michael Peter Christen
a5d7da68a0
refactoring: removed dependency from switchboard in Balancer/CrawlQueues
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
8429967ea7
no more SVN
13 years ago
Michael Peter Christen
0466bb0ddf
no more SVN..
13 years ago
Michael Peter Christen
4844e124b1
one more warning in case that crawling is paused because of low disk
...
space
13 years ago
Michael Peter Christen
0ec2713af8
'download'
13 years ago
Michael Peter Christen
2be327b5ab
update location update
13 years ago
Michael Peter Christen
f30c577fdb
add hint to speed up search results
13 years ago
Michael Peter Christen
6b133de3e9
add hint for consulting support
13 years ago
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
13 years ago
Michael Peter Christen
eb2c8ffa62
display is not used any more
13 years ago
Michael Peter Christen
91a86f0b06
fixed to network graph testing
13 years ago
Michael Peter Christen
f31ad84d98
automatic generation of blacklist pattern, see
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2685&p=25305#p25305
13 years ago
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
13 years ago
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
13 years ago
reger
06951ef751
remove heuristic scroogle from search option help text in index.html
13 years ago
Michael Peter Christen
e377092198
fix to xml output format
13 years ago
Michael Christen
41be98dc9d
extended webstructure api to show together with incoming links also
...
outgoing links
13 years ago
Michael Christen
02e4dedff2
fix to url citation collection
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
13 years ago
Michael Christen
8f89c8ef07
added information about inbound, outbound and citation links into
...
yacydoc api servlet
13 years ago
Michael Christen
71649a1296
added an api to retrieve the new citation.index with the
...
webstructure.xml api. This api will respond with details about a single
URL if requested with 'webstructure.xml?about=[url|urlhash|host]'.
13 years ago
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
13 years ago
Michael Christen
22f05c83ff
fixed default must-match filter for full domain crawls - the old filter
...
was to restrictive and did not allow intranet crawls
13 years ago
Lotus
3e61287326
some better feedback on properties change
13 years ago
Lotus
96ac95cff9
added hint how to change integration options
13 years ago
Thomas
4f61b8fd82
Fixes for compare-search
13 years ago
Thomas
e0680de7b3
Remove Scroogle from compare-search, Scroogle is dead
13 years ago
Lotus
78f0d8f046
no focus on preview frames for search integration
...
fixes bug http://bugs.yacy.net/view.php?id=161
13 years ago
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
13 years ago
Lotus
e14eb9de82
checkalive.sh: try to fetch only once (default: 20)
13 years ago
Lotus
7792ac6406
fix links & bug #163
13 years ago
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
13 years ago
Michael Peter Christen
aba9b1bfa0
better names for elements of a linked graph
13 years ago
Michael Peter Christen
0cc0290978
bugfix for a must-not-match pattern check. This bug did not make the
...
check semantically wrong, but a trick that prevented an IP lookup in
case that the filter was not used did not work. That bugfix causes that
crawling gets a huge speed boost for noload urls!
13 years ago
Michael Peter Christen
2fc8ecee36
ConcurrentLinkedQueue has a VERY long return time on the .size() method.
...
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
13 years ago
Michael Peter Christen
8aba045ba1
if a new pop-up page is set in config portal, then this page applies
...
also to the default page configuration for the httpd if no path is
given.
13 years ago
Michael Peter Christen
fa7b3481b3
better navigation in file search: less results by first try, but much
...
faster. after the first search is done, buttons appear to get more
results for the same search
13 years ago
reger
5fd2c30318
adjust Netbeans project class path settings to updated httpclient and commons jars
13 years ago
reger
aae75def69
fix: prevent logging of Solr doc content
...
with attached Solr server transfered content is written to log despite
log level = off
fixed naming of httpclient logger
13 years ago
Michael Peter Christen
8c06925984
animation of the web structure picture
13 years ago
Michael Peter Christen
898fa7c3f3
use tld heuristic to check if a domain is local or global
13 years ago
Michael Peter Christen
213c8d97f2
use less proccesses in process pool
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen
9c51db4243
Release_1.02
13 years ago
Michael Peter Christen
36e4d82b27
changed ranking
13 years ago
Michael Peter Christen
99c74699de
removed scroogle (scroogle is dead)
13 years ago
Michael Peter Christen
f7ed050771
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
096c17e7cd
added test code
13 years ago