You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yacy_search_server/source/de/anomic/crawler
orbiter 610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
14 years ago
..
retrieval - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index. 14 years ago
Balancer.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 14 years ago
CrawlProfile.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 14 years ago
CrawlQueues.java - not doing merge-jobs while short on Memory 14 years ago
CrawlStacker.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
CrawlSwitchboard.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 14 years ago
ImporterException.java added final where possible 17 years ago
Latency.java - refactoring of robots 14 years ago
NoticedURL.java added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer 14 years ago
RSSLoader.java stop loading via http at defined maximum of bytes - even size is unknown before loading 14 years ago
ResourceObserver.java Implementation of strategies for controlling memory resources. 14 years ago
ResultImages.java - fixed a bug in crawl start with file name (npe in new url) 14 years ago
ResultURLs.java refactoring: moved all score-related classes to new ranking package 14 years ago
RobotsTxt.java - enhanced ybr ranking computation 14 years ago
RobotsTxtEntry.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
RobotsTxtParser.java - refactoring of robots 14 years ago
SitemapImporter.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
ZURL.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 14 years ago