You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yacy_search_server/source/de/anomic/crawler
orbiter 49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
13 years ago
..
retrieval added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 13 years ago
Balancer.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 14 years ago
CrawlProfile.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 14 years ago
CrawlQueues.java - not doing merge-jobs while short on Memory 13 years ago
CrawlStacker.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
CrawlSwitchboard.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 14 years ago
ImporterException.java added final where possible 17 years ago
Latency.java - refactoring of robots 14 years ago
NoticedURL.java added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer 14 years ago
RSSLoader.java stop loading via http at defined maximum of bytes - even size is unknown before loading 13 years ago
ResourceObserver.java Implementation of strategies for controlling memory resources. 13 years ago
ResultImages.java - fixed a bug in crawl start with file name (npe in new url) 14 years ago
ResultURLs.java refactoring: moved all score-related classes to new ranking package 13 years ago
RobotsTxt.java - enhanced ybr ranking computation 14 years ago
RobotsTxtEntry.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
RobotsTxtParser.java - refactoring of robots 14 years ago
SitemapImporter.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
ZURL.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 14 years ago