yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	103a8348b3	fix for NPE and small performance enhancement	8 years ago
reger	5aaa057c65	ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read. equalizes behavior with getListString() improves: case were blacklist file contained a undesired empty line, not fixed by blacklist-cleaner.	9 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
Burkhard	9a18e2297b	Merge pull request #51 from JeremyRand/multiple-boost-query Fix multiple boost queries	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
JeremyRand	58824dfa6c	Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
sixcooler	dce1cb65c4	Merge remote-tracking branch 'choose_remote_name/master'	9 years ago
reger	b4b6910d60	fix (todo): correct doc.id of remote search result if no match with newly calculated doc hash if different. Testing showed that in some cases delivered url doesn't match the local calculated hash. In this case replace doc.id (and host_id_s) with calculation from url.	9 years ago
reger	cb83e65f89	drop returning document language "en" if unknown (fix todo) which also harmonizes handling of query.modifier for rwi and solr results (to result must match a given language filter)	9 years ago
luc	70595d05d0	Modified MemoryControl.main() test to properly end for better results displaying.	9 years ago
reger	cdb8f3b10d	make current ranking score value avail. to search interface / api Update the result score result field with the result queue ranking value to reflect the actual calculated/used score, for rwi & solr stack results. (calc. etc. is unchanged, it's just that result entry carries the latest val as api retrieves the number from it)	9 years ago
Michael Peter Christen	d82d311995	Merge branch 'master' of https://github.com/luccioman/yacy_search_server # Conflicts: # .classpath	9 years ago
reger	1160b13172	remove unused md5 from ViewFile servlet params	9 years ago
reger	b2c8bc0ae6	remove md5_s from default index fields it is not assigned a value / not used Due to above also excluded from transfer protocol.	9 years ago
luc	5bbb2e1730	Ensure resource is closed when reading a full file InputStream	9 years ago
reger	7d0d19cb8e	avoid File.deleteOnExit() on temp files JVM registers each file in a list regardless of already deleted and never cleans up the list during runtime. This accumulates to a considerable amount of mem during large crawls and/or long uptime. To tackle this, all temp files are now created in a subdir of java.io.tmpdir and the jvm tmpdir property is set to this subdir, which is deleted by code on shutdown. Additionally let pdfParser use this tmp subdir too.	9 years ago
reger	02e4489a23	set tmpfile.deleteOnExit by default, to make sure files are removed on shutdown.	9 years ago
reger	ca3d26a401	harmonize wordsintitle & CollectionSchema.title_words_val calculation, remove obsolete partial init of wordreference from urimetadata	9 years ago
sixcooler	d3b9349b6f	simplification / speedup of GenerationMemoryStrategy	9 years ago
luc	c38d6c1f37	Correction for mantis 535: inurl: parameter doesn't work on URLs with upper-case letters	9 years ago
reger	3f2b8ab5e5	optionally include mime in p2p url exchange string if doctype decodes to ambiguous mime and default conversion is not equal to original	9 years ago
reger	e37a4f0b3d	prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation	9 years ago
Michael Peter Christen	c40c302748	when many crawl queues are generated, this NPE can occur; probably caused as concurrency issue: W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2239) at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271) at java.util.TreeMap.put(TreeMap.java:582) at net.yacy.kelondro.table.Table.<init>(Table.java:235) at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229) at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204) at net.yacy.crawler.HostQueue.push(HostQueue.java:397) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355) at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)	9 years ago
luccioman	2f0f0180e2	Added a function to list files recursively.	9 years ago
reger	0e4ba0360b	fix NPE on .yacyh result url of disconnected peer (cleanup yacyshare remaining)	9 years ago
Michael Peter Christen	dbbad23e12	removed warnings	9 years ago
Michael Peter Christen	b94bd7f20a	a collection of search query enhancements: - fixed superfluous space in query field list - fixed filter query logic - removed look-ahead query which caused that each new search page submitted two solr queries - fixed random solr result orders in case that the solr score was equal: this was then re-ordered by YaCy using the document hash which came from the solr object and that appeared to be random. Now the hash of the url is used and the score is additionally modified by the url length to prevent that this particular case appears at all.	9 years ago
Michael Peter Christen	34de1e8cbc	gzip compression will perform more efficient and with better compression level	10 years ago
Michael Peter Christen	a1a8edfc0a	wrap HeaReader close() in a catch Throwable block to prevent that an excpetion during close blocks the whole shotdown process	10 years ago
reger	8b35656007	remove hard throw exception in makeResultEntry remove not used "share." peername.yacy url rewrite	10 years ago
reger	dd7782bac0	revert deletion of BinSearch (accident)	10 years ago
reger	000dde9511	Eleminate duplication of values for search ResultEntry by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode. - moved remaining ResultEntry functionallity to URIMetadataNode - for 1:1 functionallity added a function makeResultEntry() - removed ResultEntry - refactored related code Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated. Main difference left is, that	10 years ago
reger	d882991bc5	Implement sharing of ioDispatcher for term & citation index as proposed in ioDispatcher description	10 years ago
reger	c60ccdfbcf	Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump, skip concurrent emergency merge dealing with/see http://mantis.tokeek.de/view.php?id=566	10 years ago
reger	13f013f64a	Limit extra sleep of BusyThread on LowMemCycle	10 years ago
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
Michael Peter Christen	9bf0d7ecb9	added a new collection type 'dht' to all documents from the peer-to-peer interface to distinguish rich and poor document data. This also reverts some changes from commit `796770e070` because the firstSeen database is the wrong method to distinguish these types of data	10 years ago
Michael Peter Christen	ee2490ab98	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	431311df42	fix get fresh_date_dt to allow returned value to be date in future	10 years ago
otter	74c7e8b686	Fixes hanging FlushThread (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5447) by replacing put() method by the more robust add() to add a merge job to the queue.	10 years ago
reger	706f75ddc2	try to fix hang on index blob merge on shutdown http://mantis.tokeek.de/view.php?id=505 It happens but not able to reproduce. This change makes sure terminate signal is catched at end of currently running merge jobs	10 years ago
Michael Peter Christen	fd4e2c809a	Show dates in the content of a document in the search result: - if an eventDate is given in the search result, replace the document date with the event date and prefix it with the string "on ". - the document date is omitted if a date from the cent is shown Added also the date as fields in the json and rss result sets.	10 years ago
reger	df83fcc4fc	disable optimistic GC assumption in StandardMemoryStrategy After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC. Disabeling this check improved eom exceptions. Added simplest testcase used for verification	10 years ago
Michael Peter Christen	ac19690d30	refactoring with CommonPattern.COMMA	10 years ago
Michael Peter Christen	3d717b749a	fix for urlmaskfilter	10 years ago
reger	24f68a4eb7	refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. default heuristicopensearch.conf: - openbdb.com removed - seems not longer to deliver results - config via solrconnector to datacite.org added (large technical library archive)	10 years ago
reger	8e751d754a	- add javadoc to busythread with hint about the init parameter useage - remove obsolete 10_httpd config parameter	10 years ago
Michael Peter Christen	3cd7deb3b8	do not flush non-errors to stdout because this is a concurrency issue. the flush-call appeared very often in thread dumps with high load, so this hopefully gives some performances	10 years ago
reger	198102304b	refactor size() -> filesize() of URIMetadataNode (harmonize with ResultEntry and to not get confused with Collection.size())	10 years ago

1 2 3 4 5 ...

822 Commits (de663be48b1fcd722c51490b1836d41c7235ec5c)