yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	2b8cc5832c	fix seek error for 0 file size records file by add extra check for file size = 0 in cleanlast() - (http://mantis.tokeek.de/view.php?id=411)	11 years ago
reger	2ba394333f	fix Crawler HostQueue release of stackfile - close stackfile inputstream at end of ChunkIterator This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)	11 years ago
Michael Peter Christen	501d55cd35	removed superfluous assert	11 years ago
Michael Peter Christen	f0db501630	better handling of ranking parameters and new default values for date navigation which is done using ranking in solr.	11 years ago
Michael Peter Christen	6634b5b737	debug code for index distribution testing	11 years ago
orbiter	97983ba89f	fixed generics warnings for generic array instantiation that appeared after migration to Java 7	11 years ago
orbiter	88f4af90da	removed warnings	11 years ago
orbiter	89f76da24b	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
sixcooler	b8cee9b7d8	remove tables from tabletracker on close to avoid lots of dead entrys in /PerformanceMemory_p.html	11 years ago
orbiter	f15c832587	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
reger	ffc5b75c73	optimize and fix lat / lon assignment	11 years ago
reger	9313447de2	reimplement tighter lat/lon calc in URIMetadataNode from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272	11 years ago
orbiter	a3542f29b4	npe fix	11 years ago
orbiter	c48d2a2a02	npe fix	11 years ago
orbiter	12ba890205	removed warnings	11 years ago
reger	727dfb5875	refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex	11 years ago
Michael Peter Christen	1aea01fe5b	fix for Table in case that requested file does not exist and paths also do not exist	11 years ago
Michael Peter Christen	da86f150ab	- added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services.	11 years ago
Michael Peter Christen	17e0956312	refactoring of SystemLoad calls (only one backend tool)	11 years ago
reger	227c42bc96	eleminate obsolete URIMetaDataRow class by joining it with/into URIMetaDataNode.	11 years ago
Michael Peter Christen	62a36fa584	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	c9f92abddc	fix: application link count (URIMetadataNode)	11 years ago
Michael Peter Christen	5b83887da8	npe fix	11 years ago
Michael Peter Christen	56710ecb26	prevent opening of new files as that could be a cause for the latest too-many-open-files exception. The old file is just truncated if the table is cleaned.	11 years ago
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	11 years ago
Michael Peter Christen	1a764135be	one more Thread Dump fix for new bootstrap css style	11 years ago
Michael Peter Christen	bb21d825f9	fix for thread dump line spacing	11 years ago
Michael Peter Christen	5f4a6892c1	enhanced RowSet re-sort limit for small sets	11 years ago
Michael Peter Christen	6ed9c0164e	attaching names to all Threads to get a better view in profiling tools like VisualVM	11 years ago
Michael Peter Christen	fdaeac374a	- enhanced postprocessing speed and memory footprint (by using HashMaps instead of TreeMaps) - enhanced memory footprint of database indexes (by introduction of optimize calls) - optimize calls shrink the amount of used memory for index sets if they are not changed afterwards any more	11 years ago
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	11 years ago
Michael Peter Christen	fbee98c06f	fixed shortcut self-reference bug	11 years ago
Michael Peter Christen	acc8d7faa7	fixed setting of shortMemoryStatus in MemoryControl	11 years ago
Michael Peter Christen	94245ce0a8	fixed "Size in KBytes" calculation in PerformanceQueues_p.html, see http://bugs.yacy.net/view.php?id=362	11 years ago
Michael Peter Christen	ebfaf753b7	- faster initialization of index files - removal of not used space if index files shrink (rare, but possible)	11 years ago
reger	a3e2cca8e9	improve isOlder check to not overwrite node index with metadata on equal load date	11 years ago
orbiter	c351e47a84	fix for bad-formatted lonlat	11 years ago
Michael Peter Christen	c87cdfca2e	do not set a load prerequisite that prevents the start of one-time-jobs	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
Michael Peter Christen	25a6c05008	experimental removal of synchronization. This should work for all cases where the size() and isEmpty() method is used only for statistics, which happens at many locations in YaCy. If these methods are used for structual reasons (like accessing the last element in an array) then it may fail or cause other problems. As far as visible, this is not the case.	11 years ago
Michael Peter Christen	5695280edd	removed superfluous synchronization	11 years ago
Michael Peter Christen	a1977b7a75	removed debug code	11 years ago
Michael Peter Christen	ec10ed45bd	better logging in logger	11 years ago
Michael Peter Christen	c3dcbdc8d5	try to recover from an OOM during citation index reading and fail-over to second solr core in case of unrecoverable OOM.	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
Michael Peter Christen	191fd3d7e7	added an optimization option to HandleSet mass data storage structure	11 years ago
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	11 years ago
orbiter	3c3cb78555	- removed a lot of garbage and bloated code from GuiHandler. - transformed log lines to String before they are stored because the storage space is about 1:250 (45kb for one line before transformation, 180 bytes afterwards) - this saves up to 10MB RAM so we can increase the number of lines to 1000 again.	11 years ago

1 2 3 4 5 ...

749 Commits (504327b15c142a88b12016fb8ee75144b822a1f3)