yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luc	c38d6c1f37	Correction for mantis 535: inurl: parameter doesn't work on URLs with upper-case letters	9 years ago
reger	3f2b8ab5e5	optionally include mime in p2p url exchange string if doctype decodes to ambiguous mime and default conversion is not equal to original	9 years ago
reger	e37a4f0b3d	prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation	9 years ago
Michael Peter Christen	c40c302748	when many crawl queues are generated, this NPE can occur; probably caused as concurrency issue: W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2239) at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271) at java.util.TreeMap.put(TreeMap.java:582) at net.yacy.kelondro.table.Table.<init>(Table.java:235) at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229) at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204) at net.yacy.crawler.HostQueue.push(HostQueue.java:397) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355) at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)	9 years ago
luccioman	2f0f0180e2	Added a function to list files recursively.	9 years ago
reger	0e4ba0360b	fix NPE on .yacyh result url of disconnected peer (cleanup yacyshare remaining)	9 years ago
Michael Peter Christen	dbbad23e12	removed warnings	9 years ago
Michael Peter Christen	b94bd7f20a	a collection of search query enhancements: - fixed superfluous space in query field list - fixed filter query logic - removed look-ahead query which caused that each new search page submitted two solr queries - fixed random solr result orders in case that the solr score was equal: this was then re-ordered by YaCy using the document hash which came from the solr object and that appeared to be random. Now the hash of the url is used and the score is additionally modified by the url length to prevent that this particular case appears at all.	9 years ago
Michael Peter Christen	34de1e8cbc	gzip compression will perform more efficient and with better compression level	10 years ago
Michael Peter Christen	a1a8edfc0a	wrap HeaReader close() in a catch Throwable block to prevent that an excpetion during close blocks the whole shotdown process	10 years ago
reger	8b35656007	remove hard throw exception in makeResultEntry remove not used "share." peername.yacy url rewrite	10 years ago
reger	dd7782bac0	revert deletion of BinSearch (accident)	10 years ago
reger	000dde9511	Eleminate duplication of values for search ResultEntry by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode. - moved remaining ResultEntry functionallity to URIMetadataNode - for 1:1 functionallity added a function makeResultEntry() - removed ResultEntry - refactored related code Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated. Main difference left is, that	10 years ago
reger	d882991bc5	Implement sharing of ioDispatcher for term & citation index as proposed in ioDispatcher description	10 years ago
reger	c60ccdfbcf	Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump, skip concurrent emergency merge dealing with/see http://mantis.tokeek.de/view.php?id=566	10 years ago
reger	13f013f64a	Limit extra sleep of BusyThread on LowMemCycle	10 years ago
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
Michael Peter Christen	9bf0d7ecb9	added a new collection type 'dht' to all documents from the peer-to-peer interface to distinguish rich and poor document data. This also reverts some changes from commit `796770e070` because the firstSeen database is the wrong method to distinguish these types of data	10 years ago
Michael Peter Christen	ee2490ab98	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	431311df42	fix get fresh_date_dt to allow returned value to be date in future	10 years ago
otter	74c7e8b686	Fixes hanging FlushThread (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5447) by replacing put() method by the more robust add() to add a merge job to the queue.	10 years ago
reger	706f75ddc2	try to fix hang on index blob merge on shutdown http://mantis.tokeek.de/view.php?id=505 It happens but not able to reproduce. This change makes sure terminate signal is catched at end of currently running merge jobs	10 years ago
Michael Peter Christen	fd4e2c809a	Show dates in the content of a document in the search result: - if an eventDate is given in the search result, replace the document date with the event date and prefix it with the string "on ". - the document date is omitted if a date from the cent is shown Added also the date as fields in the json and rss result sets.	10 years ago
reger	df83fcc4fc	disable optimistic GC assumption in StandardMemoryStrategy After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC. Disabeling this check improved eom exceptions. Added simplest testcase used for verification	10 years ago
Michael Peter Christen	ac19690d30	refactoring with CommonPattern.COMMA	10 years ago
Michael Peter Christen	3d717b749a	fix for urlmaskfilter	10 years ago
reger	24f68a4eb7	refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. default heuristicopensearch.conf: - openbdb.com removed - seems not longer to deliver results - config via solrconnector to datacite.org added (large technical library archive)	10 years ago
reger	8e751d754a	- add javadoc to busythread with hint about the init parameter useage - remove obsolete 10_httpd config parameter	10 years ago
Michael Peter Christen	3cd7deb3b8	do not flush non-errors to stdout because this is a concurrency issue. the flush-call appeared very often in thread dumps with high load, so this hopefully gives some performances	10 years ago
reger	198102304b	refactor size() -> filesize() of URIMetadataNode (harmonize with ResultEntry and to not get confused with Collection.size())	10 years ago
reger	c6f634a4f2	remove redundant caching of urlhash in URIMetadataNode (is already cached in underlaying DigestURL .url) upd pom keyword for maven-antrun-plugin	10 years ago
Michael Peter Christen	413eeefed4	added character set detection library from http://www-archive.mozilla.org/projects/intl/chardet.html	10 years ago
Michael Peter Christen	a304058840	added Image Events as another option to generate images with a mac if no Ghostscript is available or does not work...	10 years ago
Michael Peter Christen	321840fde3	Replaced all fixed thread pools with cached thread pools. The cached thread pools will flush their cached (dead) threads after 60 seconds. This will cause that YaCy now runs constantly withl about 50 threads, about 100 at peak times. Previously, about 400 threads had been cached and kept in a hibernation state, which caused that the numproc counter in /proc/user_beancounters (exists only in VM-hosted linux) was as high as the cached number of threads. This caused that VM supervisors terminated whole VM sessions if a limit was reached. Many VM providers have limits of numproc=96 which made it virtually impossible to run YaCy on such machines. With this change, it will be possible to run many YaCy instances even on VM hosts.	10 years ago
Michael Peter Christen	7bfab5eb9d	set Busy- and Blocking-Threads to daemon mode (they will now not prevent YaCy from termination if still running)	10 years ago
Michael Peter Christen	ad0da5f246	added new web page snapshot infrastructure which will lead to the ability to have web page previews in the search results. (This is a stub, no function available with this yet...)	10 years ago
Michael Peter Christen	4920ab7b76	optimize usage of size() cache	10 years ago
Michael Peter Christen	2beb6abeb6	disabled crazy sleep loop	10 years ago
Michael Peter Christen	8aee7f940e	added missing class for latest changes	10 years ago
Michael Peter Christen	97039049e4	fix in key enumeration methods for cases where the enumeration is done in reverse order.	10 years ago
Michael Peter Christen	421ee64f33	another fix to ordering of table indexes; fixes also network stats graphics	10 years ago
Michael Peter Christen	1db476c67e	fix for bad table iteration	10 years ago
orbiter	0fcd8097a3	removed unused options from BusyThreads	10 years ago
sixcooler	72561926aa	do not overwrite yacy.conf in case of an exception may be a fix for http://mantis.tokeek.de/view.php?id=180	10 years ago
Michael Peter Christen	bc275dca07	added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer.	10 years ago
Michael Peter Christen	ee27be3399	misc bugfixes (concurrency, memory protection)	10 years ago
Michael Peter Christen	7817fc50c9	added a high cpu cycle monitor to PerformanceQueues	10 years ago
orbiter	3ac31614a3	added option to reverse-sort YaCy tables (internal API change only)	10 years ago
Michael Peter Christen	ec6082c872	very bad language detection hack fix hack	10 years ago
Michael Peter Christen	a7dd89c4de	changed method to write the citation index: do not catch up references during document parsing; instead use the same references that would also be written into the webgraph. That should cause that the webgraph and the citation index express the exact same semantic.	10 years ago

1 2 3 4 5 ...

802 Commits (5445f38070af55ed56a5c826e20838925cfc2519)