yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	2ba394333f	fix Crawler HostQueue release of stackfile - close stackfile inputstream at end of ChunkIterator This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)	11 years ago
orbiter	97983ba89f	fixed generics warnings for generic array instantiation that appeared after migration to Java 7	11 years ago
sixcooler	b8cee9b7d8	remove tables from tabletracker on close to avoid lots of dead entrys in /PerformanceMemory_p.html	11 years ago
Michael Peter Christen	1aea01fe5b	fix for Table in case that requested file does not exist and paths also do not exist	11 years ago
Michael Peter Christen	da86f150ab	- added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services.	11 years ago
Michael Peter Christen	56710ecb26	prevent opening of new files as that could be a cause for the latest too-many-open-files exception. The old file is just truncated if the table is cleaned.	11 years ago
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	11 years ago
Michael Peter Christen	fdaeac374a	- enhanced postprocessing speed and memory footprint (by using HashMaps instead of TreeMaps) - enhanced memory footprint of database indexes (by introduction of optimize calls) - optimize calls shrink the amount of used memory for index sets if they are not changed afterwards any more	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	888a985dc6	set a higher limit for table copy usage	12 years ago
Michael Peter Christen	5e182a566f	- added another enumeration method in kelondro data structure to get a more random access to data for the balancer - added random access inside the balancer	12 years ago
orbiter	276dd6452b	removed warnings	12 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	132afaf687	removed unaccessible code	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	2280a7b276	- changed initialization order to prefer allocation of memory for table files first - bugfixes in memory amount calculation	13 years ago
Michael Peter Christen	0746308bc2	only the metadata tables shall be able to use the tail cache	13 years ago
Michael Peter Christen	7ec9bef0c3	fix for OOM	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Roland 'Quix0r' Haeder	fbb946f913	Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	0cf3d36eae	more tolerance in case of corrupted file	13 years ago
Michael Peter Christen	e3bb73c3d6	serialized some database access methods	13 years ago
Michael Peter Christen	49be60a7c8	WorkflowProcess is forced to make small pauses if shortMemoryStatus is reached.	13 years ago
Roland 'Quix0r' Haeder	fa08ed5ae5	Fixed a lot CHMOD rights (no need for execute flag on .java/.html) and introduced local/remote crawl size ratio based check	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	404758698a	less io operations	13 years ago
orbiter	35a9e8f307	- fixed network graphic - debuged evaluation tables - changed cache settings in template engine - some speed hacks - changed int angles for peer positions in network graphic to double angles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e914a30099	fix for npe git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8032 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e58438c01c	- added a new retry connector for solr (for cases where solr responses are slow) - added a new exist property into the metadataRepository which includes solr entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d8d9735b4f	stability bugfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8012 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1b86d06d1e	fix for http://bugs.yacy.net/view.php?id=62 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	035ebfbf3b	- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill) - this may have also (good) performance side effects on other parts of YaCy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0c6d95e57b	- more tolerance against failure of table opening - more connections for solrj git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2c4a672fe2	bugfixes and performance hacks for tabe index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago

1 2 3

110 Commits (504327b15c142a88b12016fb8ee75144b822a1f3)