- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
the segments had been there to create a tenant-infrastructure but were
never be used since that was all much too complex. There will be a
replacement using a solr navigation using a segment field in the search
index.
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
- removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method)
- increased speed of CrawlResults page (no dns lookup any more)
- increased speed of favicon display (removed dns lookup)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542