yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	0f0b4aec75	better index cell merge logic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5754 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	832fef670f	migration of urls-files into subdirectory METADATA git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5753 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	37f892b988	added new concurrent merger class for IndexCell RWI data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b3f75e48fa	- enhanced balancer: auto-solving of waiting-deadlocks - removed deprecated cache-init size value - more debug lines for IndexCell cache dump merge git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9a90ea05e0	added a merge operation for IndexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5727 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a9cea419ef	Integration of the new index data structure IndexCell This is the start of a testing phase for IndexCell data structure which will replace the collections and caching strategy. IndexCall creation and maintenance is fast, has no caching overhead, very low IO load and is the basis for the next data structure, index segments. IndexCell files are stored at DATA/<network>/TEXT/RICELL With this commit still the old data structures are used, until a flag in yacy.conf is set. To switch to the new data structure, set useCell = true in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION This code is still bleeding-edge development. Please do not use the new data structure for production now. Future versions may have changed data types, or other storage locations. The next main release will have a migration feature for old data structures. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	83792d9233	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	474aac65af	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5719 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	209f25f5f5	refactoring to integrate indexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dff1cba62	removed option to use different primary keys in kelondro tables this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7f67238f8b	refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	14a1c33823	refactoring of wordIndex class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e2e7949feb	replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	9f7e62e900	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	100247bdda	added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following: java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function. The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	60078cf322	added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS to use this, you must user the -incollection command before (see SVN 5687) and you need a used.dump file that has been produced with that process. Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump or use different names for the dump files or more memory. As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections. The file has the format {hash-12}* that means: 12 byte long hashes are listed without any separation. The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b1ddc4a83f	do not merge collections if ram == false git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5691 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b80db04667	- refactoring of IntegerHandleIndex and LongHandleIndex (better method names) - fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled - more memory for DHT selection required - stub for URL reference hash statistics in index collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	efcd95dc37	simplification of (internal) query process / refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	aa44d9bad9	more refactoring of kelondro.text / deleted de.anomic.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6ffc6e3389	more refactoring of indexer and kelondro classes; - integrating the indexer into kelondro as package 'text' - renaming of classes in kelondro.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	404bc21da9	simplification of (internal) query process / refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5662 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	76ef5f0f14	refactoring of index package: better names for the classes (to be continued) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2df57b1fd1	refactoring of index collection class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	39a177649b	* added upnp listener for devices that do not respond to discovery but advertise themselves * moved package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c12bb8a6d0	- refactoring of the http client - added a protection against memory leaks for the access tracker git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	62505bb3cb	more bugfixes as recommendet by findbugs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4db80065ac	select more git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5617 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	94c42691d8	- reject less transmissions as transmission receiver - do not flag too much receiver when something goes wrong during transmission as sender git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5616 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	59427064fb	first part of 'doubles' fix (not fully ready yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5612 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	26978b2a25	- better memory protection in kelondro caches: computation of needed memory for cache grow - removed excessive gc calls - step to 16 vertical DHT partitions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5611 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	e9e2fff47a	better scaling on performance graph git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5610 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	4aad461100	added UPnP support YaCy can now automatically forward ports on home routers off by default git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	be0c492ae5	fix for memory leak bug in new dht transmissions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5606 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	40d9849aa4	- better control of chunk size in dht selection - more restrict values in selection - step to 4 vertical partitions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5603 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	411f2212f2	more memory leak fixing hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5599 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	985d421f91	found and fixed some memory leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5596 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	333489420b	- fix for NPE when loading the cytag image - some hacks for less memory usage: -- less usage of buffer and cache memory in EcoFS -- buffer allocation on-demand in BufferedIOChunks -- removed largest ybr idx git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5595 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6a32193916	- refactoring of cache naming in web index cache (no more dht semantics there) - activating a feature in the thread dump that cuts off dumping of a trance of inside-java-core events git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5593 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6c627dbdff	update to the server core git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5591 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5393f356aa	fix for termination problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5589 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6a876ecb88	first fixes to the DHT transmission process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5588 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c25c334b75	replaced old DHT transmission method with new method. Many things have changed! some of them: - after a index selection is made, the index is splitted into its vertical components - from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue - each splitted chunk gets its own transmission thread - multiple transmission threads are started concurrently - the process can be monitored with the blocking queue servlet To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed. The new index distribution model using a vertical DHT was implemented. An abstraction of this model is implemented in the new dht package as interface. The freeworld network has now a configuration of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free. This modification has three main targets: - enhance the DHT transmission speed - with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times. - the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before. with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs. BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	01b97ef3f8	added new cybertag-tracking feature that was inspired by itgrl from the forum discussion in http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612 The feature will provide two basic entities: - you can integrate image links which point to your yacy installation anywhere in the web. the image can be loaded with <img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>"> This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill Use this, to track your activity in the web. - you can view your tracks at http://localhost:8080/Tracks.html - There is a public api to your tracks at http://localhost:8080/api/tracks_p.json which needs authentication git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b57c9da1f8	- fixes to doc, ppt, xls parser: better title - fixes to httpd server response header generation - fixes to a server date computation bug - new Button in indexControl to view content of url in ViewFile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5576 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9d282d2c16	- renamed interactivesearch to yacyinteractive - added a configuration option to set the pop up page in Config Appearance - added a minimized header option to yacyinteractive - fixed a bug in yacysearch: default values when no query is done git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5569 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d3e33fd6c1	removed strange retry logic from DHT transfer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5564 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ef82cced01	removed default line 'P2P WEB SEARCH' if no line is given git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5553 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	94110df85a	moved logging partially to kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	024da2916b	refactoring of logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	83ce65707a	(almost) completed partition of classes in kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7ee494fde5	more refactoring of kelondro: - seperated BLOB from table classes - renamed 'coding' package to 'order' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	d4281b78da	dynamic memory scale git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5541 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bf93767ec6	refactoring of kelondro database classes (to be continued) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fc27bf8c4c	refactoring of kelondro classes: kelondro shall become independent from other packages. moved bytebuffer, date and memory to kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	419469ac27	added more methods to control the vertical DHT (not yet active .. ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5514 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dedfc7df7f	removed distinction between DHT-in and DHT-out. This is necessary to make room for the new cell data structure, which cannot use this this distinction in the first place, but will enable the same meaning with different mechanisms (segments, later) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5511 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b74159feb8	preparations to integrate the new 'cell' index data structure (this commit is just to move development files to my other computer, no functionality change so far) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5509 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d1bace5e4d	enhanced cleanup function git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5488 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ff41da613e	removed exception printout during load of snippets git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5484 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bed38a5f8c	fix for uncaught exception in RSSReader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5482 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a6b29cf72c	reverted change of search event processing in SVN 5460. The new code did not work properly, it gave remote search requests too less time git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5479 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9ef77d57f5	added an access control to the search interface using white/blacklists: in the network configuration, you can configure a whiteliste and a blacklist - blacklistet clients cannot search - whitelistet client get never any search restrictions - for all other clients: apply DoS search restrictions Please see the example configuriation in yacy.network.freeworld.unit by default, all clients from localhosts get whitlistet. If you have your own YaCy network, please put all the IPs of your peers into the whitelist git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5475 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	efe801173c	better dht-in cache flush. see also: http://forum.yacy-websuche.de/viewtopic.php?p=11936#p11936 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5472 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e948df68ac	longer timeout for queues during shutdown git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5469 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b2a8c653ee	small fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5464 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4f45605f04	small update for timing in search result processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5460 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b2b7edae18	fixed interactive search - added dummy servlet class, because otherwise the template engine is not triggered. thats so because the yacy httpd works much faster as normal file server without a scan of the served pages. Therefore each page with templates must now have a class file associated to it. - fixed json output format of yacysearch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5449 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	2be119f0df	adjusted big peer to 28M links git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5448 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c6880ce28b	removed the permanent cache flush and replaced it with a periodic cache flush The cache is now flushed only for one second every ten seconds. During a crawl the cache fills up completely, and is only flushed if space is needed for more documents. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6c7e83909b	- refactoring of data access methods to be prepared for new cell data structure - removed a memory overhead in collections which prevent OOM Exception in low memory configurations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5443 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c4c4c223b9	fixed a problem with attribute flags on RWI entries that prevented proper selection of index-of constraint git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5437 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6072831235	no cr transmission for robinson peers see also: http://forum.yacy-websuche.de/viewtopic.php?p=10290#p10290 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5436 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	be4c458951	refactoring (implemented Iterable in kelondroRowCollection) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5432 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b6bba18c37	replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	025094675f	* remove empty directory * add necessary dependency for pdfParser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5424 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e004da48d3	- added fast fingerprint computation for files (any). Will be used in new index dump method - refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	963da8c3f9	* updated tm-extractors to new version 1.0 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5405 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e34ac22fbd	- added new monitoring servlet at http://localhost:8080/PerformanceConcurrency_p.html - used the new monitoring to do some fine-tuning of the indexing queue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5402 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d376d81fc4	replaced busy thread control of crawl stacker by blocking threads git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5400 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8cb7170b75	- set status of kelondroTree, kelondroBLOBTree and kelondroFlexTable to deprecated - removed initialization and/or usage of kelondroFlexTable (should meanwhile not be used any more) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5396 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7535fd7447	- refactoring of CrawlEntry and CrawlStacker - introduced blocking queues in CrawlStacker to make it ready for concurrency - added a second busy thread for the CrawlStacker The CrawlStacker is multithreaded. It shall be transformed into a BlockingThread in another step. The concurrency of the stacker will hopefully solve some problems with cases where DNS blocks. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5395 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2802138787	- refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck) - fix of shallBeOwnWord target computation heuristic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5392 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1779c3c507	- added a read cache to the RAFile interface to RandomAccessFile - added a write buffer to BLOBHeap - modified the BLOBBuffer (is now only to buffer non-compressed content) - added content compression to the HTCache The new read cache will decrease the start/initialization time of BLOB files, like the HTCache, RobotsTxt and other BLOBHeap structures. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4a2dac659e	more speed hacks: - modified and activated write buffer - increased cache flush factor - fixed a problem with deadlocking of indexing process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	47292e696a	more performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	1951d30a62	addendum to last commit handle words with length < 3 correctly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	325ba7bfb8	only query words with length > 2 this is not complete, yet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	5af8923f37	* distribute forgotten jar-file in parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5355 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b0f2003792	fast database initialization and fast start.up of yacy: - applied knowledge about concurrent files stream reading and index processing from the wikimedia reader to the EcoTable initialization process: the file reader is now concurrent to the index generation - changed also some initialization processes to avoid some pauses during initialization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	867d0f2f56	removed some unnecessary pause delays git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8c96bc2ac1	do not use proxy caching rules for crawling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dba7ef5144	extended crawling constraints: - removed never-used secondary crawl depth - added a must-not-match filter that can be used to exclude urls from a crawl - added stub for crawl tags which will be used to identify search results that had been produced from specific crawls please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'. Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	96174b2b56	more debugging / better result status logging for parser/caching errors git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5341 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	90e78b2cf6	* improve encoding detection of http service git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5337 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ef66438662	- more space in error db to store larger error messages - added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	674ad2d55b	different handling of error cases that occur during loading files with http or ftp: methods throw exception instead of returning an error string git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5328 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	7e1fe05e3c	* added utf8-encoding to many getBytes-calls * utf8 should work now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5323 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	fad044fb54	update to snippet marker: - do not display indexed html (solves xss issues) the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags. - improved speed for existing routine heavy used regex pattern are precompiled now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5322 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3f746be5d4	- consolidation and refactoring of many DHT target - computing methods - implemented vertical DHT acceptance ("my own DHT") to accept new targets - added new target computation for global search: addresses vertical targets also - enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries) - better performance value computations for PPM selection in network configuration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d014b2728a	Design-check, Extension and Refactoring of DHT target position computation: - two different computations (but mathematical equivalent) of the DHT distance had been consolidated - moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets - added fast Long - to - hash computation - high-precision target computation of gaps for new peers - added new target computation for horizontal and vertical DHT targets (not yet in use) - old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	22989d0d8a	added property index.storeCommons to switch commons storage on or off with index.storeCommons=false all currently stored commons are deleted! Default is now 'true', but in future full releases it will be switched to 'false' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	340ecd919d	* include non ascii characters in visible characters git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5312 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	00e27e5050	) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist ) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this) ) further minor fixes ) to be continued... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5301 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b098522977	some very small advances to index utf-8 (not working yet), inserted also debugging code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5298 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2f49666908	integrated the character decoding into the parser, removed old code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5297 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0edec2b760	FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html. The old process used a not really efficient way to detect html encoding strings in texts. All calling methods had been adoped to call the new class in an enhanced way with less parameters. Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	2e53cbc66a	should compile now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5292 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	f3bf2e379e	should compile again git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5291 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	dd8441f102	fix bug: data from plasmaParser is allready converted to UTF-8 After removing the restrictions in the code, YaCy should be able to index Unicode-charaters! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5290 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6941bf42b1	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9b0c4b1063	redesign of parts of the new BLOB buffer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5287 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1778fb420d	- added some performance tweaks to the new BLOB buffer - removed the now superfluous HT storage thread - reduced number of file decompression by shifting the compression moment to the future git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9663e61449	added another class to handle BLOB writings to the new HTCACHE data storage: - entries are buffered and written as stream with many entries at once (saves many IO accesses) - entries are compressed with gzip: increases capacity of cache - concurrency for stream-writing and compression: all writings to the cache are non-blocking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5284 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	826ca79735	refactoring and new architecture to store the files of the web cache: - files are not stored any more as individual files - a new database structure using BLOBHeap files stores many cache entries in common files - all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ffed5fc415	fixed problem with lost peers in database migrated seedDB from BLOBTree to BLOBHeap git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2d65887723	- fix for bug in new profile handling - added a new feature in ymageChart (cannot be seen yet, just wait... will be used in profiling chart) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5261 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ff68f394dd	fix for problem with balancer and lost crawl profiles: if crawl profile ist lost, no robots.txt is loaded any more git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5258 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	fb8d9850ea	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1462 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5248 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9ac16f565b	- fixed several bugs in database management functions - fixed a display bug for the performance graph - fixed deadlock when initialization of awt happens simultanously - removed some debugging output git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	820a03f9d6	- removed some warnings - used fix in SVN 5233 for ysearch.java and search.java git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5237 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c8bdd965ec	- larger update time for status page - balancer writes cause of robots.txt in log file for crawl delay - removed log output for forced GC - smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ce4715e305	removed indexing of anchor links and tagging such words as part of urls (that was wrong) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5219 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ce57de6cb3	- fixed re-setting of DHT Send/Receive settings - small change to network grafics: smaller circles / more URLs necessary for full radius; more PPM necessary for full crawling circles - fixed exclusion search ('-' did not work any more) - fixed NPE bug when FTP loader wrote to the error-db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5218 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	7afa084207	* add nativ java trayicon, using reflections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5209 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6e7d113eac	fix for wrong index initialization after network switch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5203 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7b35d54c6c	fixed some problems with network switching (was not completely 'clean') git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5200 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f0b42e5a98	fixed NPE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5199 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	8e0de7f180	update to language statistic evaluation: - the condenser does not abandon too small words any more before feeding the statistics - for text indexing no more urls are used to feed the index (this was wrong, but in contrast the indexing of urls for media search is necessary) - urls are not used any more to feed the statistics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5197 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1198eeecc7	added language selection to search query: - the language can be selected using a LANGUAGE:<language> element in the query line, i.e.: java LANGUAGE:en - the language can be selected with a post element in google-style syntax with the 'rl' element: ?lr=lang_en&query=java git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	00c1535f84	added ranking and evaluation of language type in a search the wanted language is taken from the browser user-agent string git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bfcf9b7aa3	- added language detection using metadata from documents: html and odt documents provide this information - metadata and results from statistical analysis are compared and result is printed out as debug lines - added ranking profile for wanted language - added class with ISO 639 table, a list of all valid country codes that will be used for the language identification git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e1f67262f7	- added and removed some debugging output - fixed a bug with merge method - patched wrong output of language identification (not fixed, only patched!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ce2a7ed116	integrated language detection classes into condenser environment git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5180 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2b13705839	fixed a mistake in indexing queue processing: documents had been parsed before it was checked if they should be indexed or not. parsing was not necessary for this check, so the check was moved in the queue in front of the document parsing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5179 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1bbf362cef	update to the crawl balancer: better organization and better crawl delay prediction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5176 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0cd0fee546	fixed bug with wrong proxy result enqueueing. See: http://forum.yacy-websuche.de/viewtopic.php?p=8130#p8130 - removed the online status property. This influenced the proxy behavior and created some complexity that was not needed because the online status was never used as it was ceated for (offline browsing) - checked all proxy identification procedures during crawling and enhanced transparency and error checking - fixed a proxy identification routine that caused the wrong selection of the proxy result queue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5173 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	670244849d	fix for http://forum.yacy-websuche.de/viewtopic.php?p=9835#p9835 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5164 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	5fbccfd75e	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1366&p=9348#p9348 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5155 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1fb1665e71	increased dht interval to avoid peer selection failure (maybe too less peers available to fill the big gaps) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5143 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1eb813bd43	shifted index deletion-on-exit rule to the class where the errors are produced git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5141 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	3ded1efe84	kelondroExceptionCounter didn't work git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5138 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	383d89481e	count errors before deleting collection.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5136 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	0bb4fbc403	delete corrupted collecion.index on exit for rebuild on next start see http://forum.yacy-websuche.de/viewtopic.php?p=9725#p9725 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5135 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	b68d06a6e8	performance settings based on network's remote crawl speed removed some _pro values from config git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5134 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bb5c898441	enhancements to localsearch behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5131 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	3c6e8d2015	set default ppm when network is switched git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5127 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	3288c19c1a	reduce remote crawl PPM for fresh peers in freeworld to 6 PPM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5124 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	77ee0765a4	- added domain statistic generation to IndexControlURLs_p.html servlet - added 'delete all' button to all results of such a domain statistic output which causes that all urls to this domain are deleted - extended stack cleaner to clean also the statistics: they are not completely destroyed, only the smallest counting domains are removed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4fbee21cea	- added fetch-ahead again (had been removed in last commit) - reverted default query mode to verify=false git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5111 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	fc03b0437a	fixed a error case where a second search after a first search with a different search word failed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5109 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ead39064c5	fixed problem with wrong result number calculation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5105 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	05dbba4bab	added logging conditions to all fine and finest log line calls this will prevent an overhead for the generation of the log lines in case that they then are not printed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d3d41e2ee4	- fixed problem with searching with quotes (still not complete, but not as bad as before) - fixed parsing of crawl-delay statements when seconds were given with float numbers - enhanced performance of profiling (not too many loggings; not more than one per second) - removed some debug output - fixed wrong return type in logging - added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!) - fixed wrong word distance computation in RWI management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	df4ff423c4	added additional properties to query id's to distinguish search events better git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5093 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	9ff4fc11da	partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5084 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	d9d9c522a1	addendum to last commit moved recrawl times for standard profiles to constants calculate new specific dates in cleanup job git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5082 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	536e77e8b7	modifications towards a single database operation to read/write http header and cached file at once: - removed distinction between header file types for http and ftp; ftp is simulated by using http properties - removed all old resourceInfo classes that handled this distinction - introduced a new distinction between http request and http response objects - unified new response objects with two other object types that had been introduced elsewhere - changed all servlet call methods to use the new http request header object type - divided static object keys for http header properties into request and response types - refactoring here and there (a large number of type changes and many methods merged/moved) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	3c68905540	remove redundant null checks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5065 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	753a1ae430	- changed default browser from netscape to firefox - fixed "Inefficient use of keySet iterator instead of entrySet iterator" [WMI_WRONG_MAP_ITERATOR, FindBugs] - fixed some possible null pointer accesses git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5063 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7989335ed6	Preparations to replace the HTCache with a new storage data structure: - refactoring of the HTCache (separation of cache entry) - added new storage class for BLOBs. (not used yet, this is half-way to a new structure) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	be28af50f5	- fixed "yacy2yacy no proxy"-problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5058 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	f99c307eff	* correct debian build dependencies * add huge mem page detection in general initscript * disable logging completely in jmimemagic-library git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5056 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bdae051d9a	- extended new performance graph (better timing) - added paths for new libraries in classpath for eclipse - refactoring to remove compiler warnings (static access to finals variables) - removed some unused import git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	a087090bbb	fixed starting crawl results in "No parser available to parse mimetype 'application/octet-stream'" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5047 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	8422ee5ec4	- fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined - serverFileUtils.copy* use now Charset instead of String - added some warnings for ignored exceptions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5043 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
hermens	cff4393f0c	Fix HTCache so oldest Files get deleted first git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5041 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	621b473b18	* removed some warnings of findbugs (http://findbugs.sf.net ) - removed unnecessary code (unused variables, String.toString) - corrected some calculations (cast int to double or long ;) - improved little performance (using Integer.valueOf() instead of new Integer) - log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions - finalized some (more) fields - finally close some streams - made inner classes static if not using environment - generalized some equals (from specificClass to Object) - fixed some potential nullpointer accesses git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ebb40d324b	enhanced memory chart: shows now also the size of the word cache as third vector. The PPM is now shown without a scale, but with a new anotation at the chart entry. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5032 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	17b7845eb5	* refactoring - moved constants from plasmaSwitchboard to own class (all 232 ;) - moved remoteProxy-Methods to httpRemoteProxyConfig, better names - removed some unnecessary code (else-statements) * formatting (correct indentation) * minor bugfixes (due to findbugs.sf.net) * hopefully fixed "missing quote" (announcing StringParts as UTF-8) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	3bb870bfcd	added final where possible git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	0b2f67577e	Index Transfer: - fix for chunk size calculation - fix: if chunk size was 1, an infinite selection loop ran because no entries were found. if chunk size fails <=3 it will be set back to 500 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5023 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	5f77f55ed7	possible fix for negative speed values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5019 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	50ef5c406f	- refactoring of robots parser (removed opaque Objects[] result vector) - added Allow-component to robots result object git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c3d461d191	- removed superfluous copyright statement - updated my email address git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	62afea0c9f	some improvements for yacyTray git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5008 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	fa695c2d9f	tray is now only shown on Windows and doesn't block on linux git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4997 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	d77ed28e2f	temporary disabled tray because of flaws on only-shell-linux git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4996 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	f8a1e3175e	new yacyTray this will make a YaCy icon in the tray area on supported platforms enabled by default the search page will open on double click used JDIC 0.9.4 from https://jdic.dev.java.net/ git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4992 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7b1c9e6aee	discovered and removed a (possibly large) memory leak: many classes used the kelondroMapDataMining (was: kelondroMapObjects) which adds statistical functions to the kelondroMap (was: kelondroObjects), but these functions were not used by these classes. Especially the HTCACHE and robots.txt database allocate a very large number of objects for statistical use, but never used them. By replacing the kelondroMapDataMining with the kelondroMap object for these classes now less memory is allocated. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4986 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0f5fe8cc53	refactoring of method calling for objects from kelondroMapDataMining git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4985 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4acf0a61cd	refactoring of kelondroObjects (mainly renaming to kelondroMap) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4982 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	441e9c861e	fix for npe in HTCache cleaning process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4981 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1e6d12f146	Major update to BLOB data structures: - introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM. very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree, which is based on a kelondroTree, a file-AVL-based index data structure. - the HTCACHE header file was replaced by the new blob heap file structure - the robots.txt file was replaced by the new blob heap file structure - the robots parser was enhanced (bugfixing for double-loading of the same robots.txt) - other BLOB-dependent data structures were prepared to use also the new BLOB heap - fixed a bug in the snippet fetch process: the file header was not written to the header index There should now be less IO during snippet fetch and during crawling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b38f467e3c	better SRU compliance git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4976 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7052f2f61f	- added copyright header of ResourceObserver - commented/removed some code to eliminate code warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4974 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1400cdc91e	- refactoring of resourceObserver (moved it to crawler) - partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling - the resourceObserver can now show a error message if the diskUsage is 'out of order' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	b6301a54fa	* added class ListDirs to provoid generic listing of directories in systemdirectories and jar-files * yacy runs, when classes are in a jar-file (->build-jar ant-target) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4971 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	f2e2d09916	- fix for index transfer - imported a random startpoint function from plasmaDHTChunk in case there was already a gap at the beginning of the index, the transfer process was endless selecting from first startpoint tested & working on my index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4970 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a6719dfd2b	- refactoring of robots parser - no more keep-order parameter in remove (it was not possible to make this strict, and not useful) - some small enhancements in balancer - robots parser without references in switchboard - changes synchronization in robots git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e81be7d4f2	added many missing user-agent declarations for yacy http client connections. the most important fix was the addition of the yacybot user-agent for robots.txt loading, because web masters look for that access to see if the crawler behaves correctly. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	474659a71f	- modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order - added option to set minimum crawl delta for domains in balancer - added default values to crawl deltas in yacy.init - added configuration for these deltas in performance queues - enhanced performance setting computation (more time for indexing queue for a faster flush - remote crawling is now enabled during local crawling if indexer has space and time for more links - added database stub for new distributed file system - refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d37fd064f9	changed peer selection for search targets: - less dht targets are selected - more other peers are selected: all robinson peers with more than one million urls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4962 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	69aac0d74c	modified the diskUsage class regarding the following two aspects: 1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over. 2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	0c1dc703e4	- set staticIP at startUp - added setting for reduced menu (simpleMenu) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4959 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b928ae492a	some code-cleanup and possible speed enhancements in different core methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4935 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c998dc6556	- added security functions to flush url and search caches in case that memory is full git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	68c38c2d34	- WatchCrawler shows status without JavaScript - Performance can be scaled + DHT-profile - names for pool-threads - some small refactorings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f5ef7f222e	- fixed a bug in parser (directory paths had not been recognized) - no access check when a search is made only local without snippet fetch - added comment and status message in resourceObserver (this takes very long at startup time!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4911 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	3330181aa0	refactoring: find a better way to store BLOBs; generalize current BLOG data structure (kelondroDyn) and prepare it to replace it with something better. The best candidate is the kelondroHeap, which will become the kelondroBLOBHeap; removed also some never-used classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4902 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago

... 2 3 4 5 6 ...

1866 Commits (43c8defd7932d5c0ce9d9ec137d328409c62d4d7)