Commit Graph

35 Commits (f7c4abfdd713659975550ca60ba91af79b2b702d)

Author SHA1 Message Date
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
orbiter 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
14 years ago
orbiter b1a8d0c020 enhancements to web cache and less strict caching rules
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
14 years ago
orbiter 73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
15 years ago
orbiter 2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter 7ab207d93a better presentation of search result metadata and fixes to htcache loading
15 years ago
low012 b97ad0f380 *) some minor changes for better code readability
15 years ago
orbiter d77a8f3b3e added some modifications recommended by PMD for better performance
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
15 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
15 years ago
orbiter 0575f12838 fix for deadlock
16 years ago
orbiter c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 024744245c small refactoring to prepare for new queues
16 years ago
orbiter 995da28c73 all stack/heap files that had been stored in DATA/PLASMA are now stored in the network-specific QUEUES path
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter 95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago