- moved storage of robots.txt entries to WorkTables, so it is now possible to browse the robots entries with the table browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6710 6c8d7289-2bf4-0310-a012-ef5d649a1542
all operations on YaCy in a database that should make it possible
1) to re-create a setting on fresh peers
2) to transmit a setting from one peer to another
3) to re-create crawl starts after a complete deletion of the index
This functionality will also support
4) scheduled re-crawls (new implementation)
To implement this, a new database structure has been crated that stores maps into blob heaps. to encode maps the b-encoding technique was used (this is the same encoding that torrent files use)
- added a b-encoder
- enhanced the b-decoder
- added a b-encoded map heap data structure
- added a table organisation based on b-encoded heaps
- added a servlet to maintain such tables (see Tables_p.html)
- integrated the servlet into the Advanced Settings menu
- added an api recording based on the new tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6606 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added BEncodedHeap class that encodes B data structures and stores that to a heap
- refactoring of MapView, this is now named MapHeap to fit into the naming scheme of the BEncodedHeap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6579 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added 'yacybot' as key to recognize robots.txt entries for YaCy
- removed unused method to get robots.txt from database
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6565 6c8d7289-2bf4-0310-a012-ef5d649a1542
- moved all index generation servlets to it's own main menu item, including proxy indexing
- removed external index import because this operation is not recommended any more. Joining an index can simply be done by moving the index files from one peer to the other peer; they will be merged automatically
- fix to prevent endless loops when disconnecting http sessions
- fix to prevent application of bad blacklist entries that can cause a 'Dangling meta character' exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6558 6c8d7289-2bf4-0310-a012-ef5d649a1542
memory.acceptDHT in kbytes
not yet pre-enabled, will clear on every startup
please review since this could break dht in freeworld
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6459 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added many stack trace outputs for exceptions in crawl profile handler to find the 'missing profile handle' bug
- catched one more timeout exception in httpd file loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6457 6c8d7289-2bf4-0310-a012-ef5d649a1542
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542