yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	a5019bc470	make Vocabulary Navigator tags a hard result entry filter by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query) TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.	11 years ago
reger	a67a4b7d86	improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org)	11 years ago
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	11 years ago
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	11 years ago
reger	b355dd52c6	Index Administration - Field Re-Indexing: exclude internal Solr _version_ field from obsolete field check	11 years ago
sixcooler	8a96140f92	fix / workaround for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750 + Seed.hash should be final	11 years ago
Michael Peter Christen	2857499467	fix to collection schema; bug appeared for _txt fields with empty String as content	11 years ago
Michael Peter Christen	dbfa865700	added a stub of a class for crawler redesign	11 years ago
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
orbiter	d38c3c14d8	fix for CGI test	11 years ago
Michael Peter Christen	31902f54df	fix for NPE which happens within solr code at MultiMapSolrParams.java, line 52 in case that the array arr.length == 0	11 years ago
Michael Peter Christen	f13df9dbb6	migration to solr 4.4.0	11 years ago
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago
sixcooler	7d53ac86a3	fix for Blacklist (-Administration)	11 years ago
reger	f2d99053ed	Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception) (occured during testing while working on q=store:[* TO *])	11 years ago
reger	92d3f71b16	htmlParser: closes input stream -> changed it to leave it open for a reset (used by AugmentParser - even if this is practically not used), note: stream.close is done by caller (Textparser.parseSource) - removed unnecessary reset in AugmentParser - added stream.mark in tdfatripleimpl. to make stream.reset work here	11 years ago
orbiter	87cfeaa4f3	fix for npe	11 years ago
orbiter	268a36aaff	emergency fix for crawler: this will otherwise cause loss of complete crawl queue if latency of remote system is too low	11 years ago
orbiter	d05e0c5368	wait a bit longer before doing the first peer ping	11 years ago
orbiter	b8f57f7703	don't be noisy when doing background tasks that may be allowed to fail	11 years ago
Roland Haeder	0343f0668c	Fix for NPE: E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in serverInstantThread.job, thread 'net.yacy.search.Switchboard.cleanupJob': null; target exception: null java.lang.NullPointerException at net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116) at net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897) at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165) Conflicts: source/net/yacy/search/schema/CollectionConfiguration.java	11 years ago
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	11 years ago
Roland Haeder	7263bb82fb	Fix for NPE on shutdown: java.lang.NullPointerException at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732) at net.yacy.search.Switchboard.access00(Switchboard.java:207) at net.yacy.search.Switchboard.run(Switchboard.java:3049)	11 years ago
Roland Haeder	13433d41a1	Log this exception better Conflicts: source/net/yacy/kelondro/blob/Tables.java	11 years ago
orbiter	080d80c9de	do not write an empty failreason in case that there is no fail. Because of the lazy instantiation rule this value was not actually written, but if lazy instantiation is switched on, then this causes that all crawl starts delete all crawl-start-hosts completely because this looks for filled error reasons.	11 years ago
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	11 years ago
Michael Peter Christen	61e015268b	fix in forced deletion: forced commit needed	11 years ago
Michael Peter Christen	83e2921b39	new test case for http://bugs.yacy.net/view.php?id=141	11 years ago
Michael Peter Christen	304aacb2cc	fix for http://bugs.yacy.net/view.php?id=267	11 years ago
Michael Peter Christen	c3b2301b2f	fix for http://bugs.yacy.net/view.php?id=268	11 years ago
reger	aa1a1f1d2c	- small adjustment to make sure genericParser is tried last -- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser (like .ps .rdf files and other) - remove redundant file extension registration	11 years ago
orbiter	3e901dcb06	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	f50b596e0b	do not run dht ditribution if system load is over 2.5	11 years ago
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	11 years ago
sixcooler	af740f3058	changed optimization to a segment-size of index-size/5.000.000 + one if not idle + one (and force) if postprocessing	11 years ago
Michael Peter Christen	336f86394c	replaced StringBuffer with StringBuilder	11 years ago
Michael Peter Christen	aeac2fb763	replaced more containsKey() -> get() usages by a simple get(), followed by a test for NULL. This should increase the application speed and reduces the lookup time for the affected methods by 50%	11 years ago
orbiter	5364c4dcc9	delayed first peer-ping to send the first ping out after the http got up; if the ping comes before the http is up, it cannot be recognized as senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266	11 years ago
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	11 years ago
orbiter	c124037f19	removed forced non-soft commits to prevent index fragmentation	11 years ago
Michael Peter Christen	31483c47e1	fixed problem with remote luke requests	11 years ago
Michael Peter Christen	c15aa758dc	removed failreason_t removal patch because that causes too much confusion using an external solr. to clean up the index after a schema change, use the index cleaner function from the online servlet	11 years ago
reger	2b7a38640a	extend content type detection on file extension for .tif .tiff .htm	11 years ago
Michael Peter Christen	ac1aad5064	added a getSegmentCount method and use it to disable optimize if wanted current segment count is below optimization level	11 years ago
Michael Peter Christen	36035e0a0a	- used reger's LukeRequest to generalize the index info in SolrServerConnector - used the LukeRequest in SolrServerConnector to replace the index size method by a getNumDocs request to a LukeRequest result	11 years ago
Michael Peter Christen	39fceb5ccf	fix for NPE & bug #264	11 years ago
Michael Peter Christen	735a66eff3	enhancements to crawler	11 years ago
Roland Haeder	be0ff6018f	Removed trailing spaces + some more final	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Felix Ableitner	03044589dd	Fixed (?i) appearing in entries, fixed multiple equal lines in file.	11 years ago
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	11 years ago
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	11 years ago
orbiter	d0dc86cf3d	logging of deadlocks (if any) during cleanup process	11 years ago
Michael Peter Christen	c6a6f159e8	fix for crawl stack domain counter	12 years ago
Michael Peter Christen	93d1bac140	do a more frequent optimization, reduces IO after optimization	12 years ago
orbiter	b71d13a014	added load and deadlock detector in Memory util	12 years ago
orbiter	290e24564b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	5533fc8e01	fix for bug 260	12 years ago
Michael Peter Christen	b79471ee67	grr	12 years ago
Michael Peter Christen	a79f288ac1	automatically running optimize on solr if user/search is idle for some time	12 years ago
orbiter	a9c8046c87	do a light optimization at the end of a crawl postprocessing	12 years ago
orbiter	a548354c71	replaced type of solr schema object sku of text_en_splitting_tight by string	12 years ago
orbiter	2f1ec8d4a2	npe fix	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	0d0b3a30f5	activate api actions after postprocessing of crawls	12 years ago
orbiter	3978c5ca5d	fix for http://bugs.yacy.net/view.php?id=255	12 years ago
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Michael Peter Christen	9a29ab469e	another patch to prevent CLOSE_WAIT status on solr connections	12 years ago
Michael Peter Christen	5091d627bc	fixed parsing of peer flags	12 years ago
Michael Peter Christen	87e9052081	added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof)	12 years ago
Michael Peter Christen	5c6946dd5f	replaced usage of log4j by ConcurrentLog where possible	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	f4f6551c66	better handling of time-out at solrj in case that a commit is done in a fail-over case during add	12 years ago
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
Michael Peter Christen	a34e137e27	fix for citation index generation in case that entry.referrerhash() is null. This is especially the case if ftp sites are crawled	12 years ago
Michael Peter Christen	a2c8116a8f	accept (but ignore) a '+' sign in front of search words	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
sixcooler	308d73f855	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	69906b1d2e	Revert "do not use remote proxy if not switched on - regardless of the proto" This reverts commit `20f452d228`.	12 years ago
sixcooler	20f452d228	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	9551720d5c	re-enable saved setting for proxy-crawl-profile	12 years ago
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	12 years ago
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
Michael Peter Christen	5a5d411ec0	new robots_i attribute fields	12 years ago
Michael Peter Christen	fa08bd9d5a	hack to prevent long waiting times in crawler	12 years ago
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	12 years ago
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	12 years ago
reger	a6bf44212e	bugfix: location (lat/lon) meta data retrival (Double.NaN check)	12 years ago
Michael Peter Christen	203921006a	redesign of citation index storage	12 years ago
reger	83763ee4a4	jpeg parser: extract GPS location from meta data	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
sixcooler	e5abccdfe4	added optimize-option	12 years ago
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	12 years ago
Michael Peter Christen	8caaf6203a	fixed false multiple-generation of remote facet search which caused high cpu usage on remote side.	12 years ago
Michael Peter Christen	823ae4d6a7	added url_protocol_s to error documents	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	c4538d8d91	added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib	12 years ago
reger	3760e2616b	bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments	12 years ago
Michael Peter Christen	9a6fcdf597	npe fix	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
reger	8d1c4c423d	make imageparser fileextension detection case insensitive (extensions are often upper case)	12 years ago
Michael Peter Christen	f9d859f5dc	now writing image alt texts and (camelcase-)parsed urls into a text search field for a better image retrieval	12 years ago
Michael Peter Christen	e441a9d4c8	to avoid confusion, the gsa api is available at /search? and /searchresult?	12 years ago
orbiter	8792e6c6e9	stub for better image indexing	12 years ago
orbiter	97f2ac9091	added hint to gsa response writer that the result comes from a yacy peer	12 years ago
Michael Peter Christen	14186e815e	npe fix	12 years ago
Michael Peter Christen	bdf306e0a7	increased time-out for loading of seed-lists	12 years ago
Michael Peter Christen	374d2e2a52	removed warning message during crawling	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	fc3ff92c69	npe fix	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	3e1e358fdc	calling pdf cache flush on class initialization because calling of the methods during runtime can conflict with dynamic solr class loader and cause a deadlock (seriously!)	12 years ago
Michael Peter Christen	291912ee52	removed misleading http accessGranted message (this is only for debugging)	12 years ago
Michael Peter Christen	2fd7bbb450	reduced load on solr; no seed update in Status and no exists-check in HTTPLoader in case of redirects, that can be done using the htcache.	12 years ago
Michael Peter Christen	2648b42b27	added fixed clear method as public method	12 years ago
Michael Peter Christen	ffc570f95f	removed forced soft commit since this may be the cause for a performance problem	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	f24574b3da	use s greeting line which does not sound so beta	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
Michael Peter Christen	8e965ffd16	fix for host compare in case that the host is null. This happens when doing a search in the intranet for file resources (they don't have a host).	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
Michael Peter Christen	f7a4377812	usage of the new normalized link polularity CRn as default ranking function. This replaces the previous formula, which was bad. Before you update to this version, please check if you changed the ranking function yourself before, since it will be overwritten.	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
Michael Peter Christen	e20450e798	patch in HTCache and CitationIndex loading in case that a file is broken: do not crash; instead ignore the file and delete it.	12 years ago
reger	d367b1f4d9	add null pointer check to stopword fix	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
Michael Peter Christen	9fc0c4df98	fix for bad exists 'enhancement'; see bug: http://bugs.yacy.net/view.php?id=245	12 years ago
reger	9ef1fd9bac	fix: enable use of solrcore.properties for property substitution of solrconfig.xml	12 years ago
reger	8a7fcb391d	enable use of solrcore.properties for property substitution of solrconfig.xml - move setting of system property solr.directoryFactory=solr.MMapDirectoryFactory to solrcore.properties - add check of os.arch for 64bit system, if it fails use default/solrcore.x86.properties (if exists) as solrcore.properties reason: on 32bit MMapDirectoryFactory may fail with..... Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)	12 years ago
Michael Peter Christen	f7e887bf49	added missing class	12 years ago
Michael Peter Christen	5f92c68f1f	removed block rank ranking and all YBR files in /ranking	12 years ago
Michael Peter Christen	164603b946	cleanup	12 years ago
Michael Peter Christen	ba793a32c0	added timeout for remote searches of 10 seconds	12 years ago
Michael Peter Christen	1c4c1c0345	try to commit in case of failure which hopefully frees up some RAM	12 years ago
Michael Peter Christen	409d6edf53	Store node/solr search threads to be able to send them an interrupt signal in case that a cleanup process wants to remove the search process. Added also a new cleanup process which can reduce the number of stored searches to a specific number which can be higher or lower according to the remaining RAM. The cleanup process is called every time a search ist started.	12 years ago
Michael Peter Christen	2a8b99ea82	remove text_t in search result after snippet has been computed to save space in search result cache	12 years ago
Michael Peter Christen	a1644ca0fd	new workflow processor in Segment to enqueue indexing documents to solr	12 years ago
Michael Peter Christen	a8dc4346e8	default configuration of MMapDirectoryFactory for solr, increased lock timeout, less documents from remote searches (too many results had easily blocked a peer)	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	5344a1c5f7	getting the trash out	12 years ago

1 2 3 4 5 ...

6572 Commits (561ea135af864574f1b462868779dc3a59678682)