Commit Graph

118 Commits (5b3acc12cd4b4343c4e7d7f0a20a1da8ea8d5f6a)

Author SHA1 Message Date
Michael Peter Christen 70505107ca enhanced crawler/balancer: better remaining waiting-time guessing
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen a5d7da68a0 refactoring: removed dependency from switchboard in Balancer/CrawlQueues
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen 2fa037ae1d enhanced crawler
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
Michael Peter Christen 1f4f60654a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
orbiter 1b86d06d1e fix for http://bugs.yacy.net/view.php?id=62
13 years ago
orbiter c61e4cfd78 - fix for incomplete clear() in balancer
14 years ago
orbiter dad5b586a4 added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
14 years ago
orbiter 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
14 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 19fd13d3bc Added federated index storage to solr.
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
orbiter 2b5f8585bf performance hack for Balancer and ip address parsing
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
low012 ce012e11aa *) deleted LogStatistics since the page did not work anymore and it seemed to be obsolete, tell me if you miss it and I will add it again
14 years ago
low012 c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter fffb91447a fixed crawl queue delete function
14 years ago
orbiter 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
14 years ago
orbiter e3e3b49d52 - enhanced main release recognition
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
15 years ago
orbiter 7fdb17bb96 redirect uncaught exceptions to logging + small other changes
15 years ago
orbiter 87b1684211 additional double-check in balancer
15 years ago
orbiter a82a93f2fc - better url double check in crawler
15 years ago
orbiter 5924a0d851 - enhanced concurrency in database index access for multicore
15 years ago
orbiter a83772c71b fixes and enhancements for balancer:
15 years ago
orbiter 9cde05418f fixed url crawl list display
15 years ago
orbiter 30b337fa9f fixes to balancer when crawling filesystem (problem was: host == null)
15 years ago
orbiter 844853243a fixed balancer time guessing
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter 40a8d132d9 tried to fix 100% CPU when calling Balancer.top()
15 years ago
orbiter 90c3e5d6f6 - cleanup, removed unused imports
15 years ago
orbiter 8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
15 years ago
orbiter 8b8107b2a3 reduced IO-load and synchronization/blocking
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 48b9371735 changed balancer re-load counter. causes less blocking here doing intranet indexing.
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter ba51d140e1 added more info in assert in balancer
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 46c4f8b68a better look-ahead into the crawl queue: show more on crawl monitor
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago