yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	11 years ago
Michael Peter Christen	69b8d61c47	fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.)	11 years ago
reger	7b17cdf6dd	add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type adresses also possible url without or deviating extension.	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	78e7aadb26	removed unused initialization method	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	85456f46b2	added two new fields, exact_signature_copycount_i and fuzzy_signature_copycount_i, which count the number of copies of non-unique documents and assigns this to each document. Thus, each document there is a number assigned which shows how many copies of this document exists. These fields are disabled by default.	11 years ago
Michael Peter Christen	a2511b5600	turned images_alt_txt back to images_alt_sxt because it is not necessary to index the alt text. Indexed image Text is in images_text_t	11 years ago
Michael Peter Christen	85b1922244	activated image type navigation for image search	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	ab1201fdfd	fixed wrong facet count	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	a8c5bfcf58	avoid to create unnecessary objects	11 years ago
Michael Peter Christen	dc179bd61f	fix for catchall query goal for image search	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	169ef8963d	one more fix for image search	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	11 years ago
orbiter	f106345eef	link strings should not be tokenized	11 years ago
reger	a5019bc470	make Vocabulary Navigator tags a hard result entry filter by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query) TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.	11 years ago
reger	a67a4b7d86	improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org)	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Michael Peter Christen	a2c8116a8f	accept (but ignore) a '+' sign in front of search words	12 years ago
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	8caaf6203a	fixed false multiple-generation of remote facet search which caused high cpu usage on remote side.	12 years ago
reger	d367b1f4d9	add null pointer check to stopword fix	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
Michael Peter Christen	409d6edf53	Store node/solr search threads to be able to send them an interrupt signal in case that a cleanup process wants to remove the search process. Added also a new cleanup process which can reduce the number of stored searches to a specific number which can be higher or lower according to the remaining RAM. The cleanup process is called every time a search ist started.	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
orbiter	da621e827e	prevent NPE in case RWI is disabled	12 years ago
Michael Peter Christen	c2b1075dcf	activating pollImmediately in case that DHT receive is off. This will cause a much faster search result when running in public robinson mode.	12 years ago
Michael Peter Christen	06d3063dc9	- no downcase when using collection modifier - removed warnings	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago
Michael Peter Christen	4058369288	fixed query expressions for collection selection (added quotes)	12 years ago
Michael Peter Christen	cca19d94d4	re-declared some fields to be of type string rather than text which makes them more efficient and less large	12 years ago
Michael Peter Christen	3841854c97	abstraction of catchall term	12 years ago
Michael Peter Christen	bb4bf3d8fd	infinity timeout bug protection patch	12 years ago
Michael Peter Christen	c091000165	added collection attribute also to the rss feed reader	12 years ago
orbiter	f7571386a3	added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time).	12 years ago
Michael Peter Christen	97775fbebc	fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/	12 years ago
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	12 years ago
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	12 years ago
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	12 years ago
Michael Peter Christen	cf0acd2cb4	upgrade to solr 4.2.1	12 years ago
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	12 years ago
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	12 years ago
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	12 years ago
Michael Peter Christen	4af0839be2	use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used	12 years ago
Michael Peter Christen	addba047e2	changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking	12 years ago
Michael Peter Christen	25300913fa	fixes to search debugging after testing with the different search debugging options	12 years ago
Michael Peter Christen	81380ae5c8	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	c2fde018b5	concurrent snippet fetching from solr results which do not have snippets	12 years ago
orbiter	b1140e3d82	added debug switches for detailed search testing	12 years ago
orbiter	cdbfddf091	added filter queries for better image, audio and video results	12 years ago
Michael Peter Christen	587ef83eab	added missing cleanup statements for short memory cases during search	12 years ago
Michael Peter Christen	ae734b3f8d	enhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again)	12 years ago
Michael Peter Christen	221ed7d764	- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters	12 years ago
orbiter	0f7ea7ad9f	- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)	12 years ago
orbiter	9c09fd7d0b	better/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages.	12 years ago
orbiter	d74472f562	corrected result counter	12 years ago
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	12 years ago
Michael Peter Christen	35fa718b77	testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
Michael Peter Christen	91a0401d59	introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema	12 years ago
Michael Peter Christen	b6de1f42dc	Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.	12 years ago
Michael Peter Christen	d3508fa8ff	fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html	12 years ago
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	12 years ago
Michael Peter Christen	e8f7b85b98	fixes to internal RWI usage if RWI is switched off (NPE etc)	12 years ago
Michael Peter Christen	3834829b37	bugfixes and more logging for solr connector	12 years ago
Michael Peter Christen	592adf7ccb	fix for domain navigation	12 years ago
Michael Peter Christen	8651ec35fe	turned author_s into the multi-valued field author_sxt	12 years ago
Michael Peter Christen	0fe7b6fd3b	migrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets.	12 years ago
Michael Peter Christen	4735bd47f4	- changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done). The softcommit feature was implemented because it was needed for the following changes (customer demands), which is also included in this git commit: - added a feature to identify all documents which have unique titles and/or unique descriptions. These unique flags are disabled by default. - added also a feature to set a flag when the url from a canonical tag is equal to the document url. This is also disabled by default. To support the new softcommit strategy, the commitWithinMs option was set to -1 do disable automatic commit based on document insert times. If documents are inserted permanently then also a commit would happen permanently whenever the commitWithinMs time is reached. This would conflict with the regular autocommit of 10 minutes and the new softcommit strategy.	12 years ago
Michael Peter Christen	cba038f97b	one more NPE fix	12 years ago
Michael Peter Christen	c3d50d91f8	relaxing site operator for www prefix: - when using a site operator search for a domain where the domain has a www prefix, also the domain without the www is enclosed - when using a site operator search for a domain where the domain has no www prefix, also the domain with the www in enclosed - in the host navigator, all domains with and without a www prefix are accumulated. That means that the host navigator does never show a host with a www prefix. This should prevent usage mistakes of the site operator.	12 years ago
Michael Peter Christen	db49e91724	fixed a NPE which may appear for freeworld peers without any rwi index data. This the NPE looked like: Caused by: java.lang.NullPointerException at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:279) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:155) at search.respond(search.java:314) ... 12 more	12 years ago
Michael Peter Christen	4faa07c214	added a timeout for topic computation (solr is here much slower than the old metadata-db)	12 years ago
Michael Peter Christen	d2d5be032d	added a 'inlink' search option according to the suggestion in the YaCy forum at http://forum.yacy-websuche.de/viewtopic.php?f=18&t=4572#p27410 The feature was not called 'haslink' but called 'inlink' to have a analogous naming like 'inurl'. This causes now that you can search for words in links of the document, like: * inlink:yacy searches all documents which link to pages which have an 'yacy' in the url.	12 years ago
reger	3897bb4409	added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index) - migrates all entries in old urldb Metadata coordinate (lat / lon) NumberFormatException still relative often (see excerpt below), - added try/catch for URIMetadataRow (seems not to be needed in URIMetaDataNode, as Solr internally checks for number format) - removed possible typ conversion for lat() / lon() comparison with 0.0f, changed to 0.0 (leaving it to the compiler/optimizer to choose number format) current log excerpt for NumberFormatException: W 2013/01/14 00:10:07 StackTrace For input string: "-" java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152) ... Caused by: java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152)	12 years ago
reger	f143804382	fix configuration for search page navigators - added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page) - currently redundant setting with part of ConfigPortal page - added missing config for filetype and protocol navigator - adjusted init of SearchEvent to check navigation config setting - renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)	12 years ago
orbiter	fe50702eb0	added a filterscannerfail attribute to QueryParams which causes that a check to the network scanner fail/success status can be used/suppressed for search results. This is a feature that comes with the port scanner.	12 years ago
Michael Peter Christen	eb90d38cd7	added missing extension 'mkv' for navigation	12 years ago
Michael Peter Christen	4a9182ae16	use the search configuration to default the cacheStrategy to the value as given in the search configuration	12 years ago
Michael Peter Christen	e1f89efd0d	- made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish	12 years ago
Michael Peter Christen	433143ba40	removed protocol, tld, ext from the urlmask and created specific navigation field for these	12 years ago
Michael Peter Christen	84f82541e8	search process enhancements	12 years ago
Michael Peter Christen	02020b590b	- removed all extension types from extension navigation which are not proper/known - automatically show the protocol navigation if there is more than http and https - automatically show the extension navigation if there is some media content	12 years ago
Michael Peter Christen	01200f06cc	using the author field as solr-native facet. this makes it necessary to introduce a copy-field for the author field to be copied to a string field. This field is then used to generate facets. Without this field, the facet would consist only of the words of the author names, not of the full author string.	12 years ago
Michael Peter Christen	bab573361f	- using a filter query for facet restriction - calculating the whole search result in at most two sub-queries from solr	12 years ago
Michael Peter Christen	1052263af3	- added a new solr field references_i which stores the number of INCOMING links to the corresponding web page. This information is taken from the reverse link index (a 'little sister' of the RWI index). - this field can be of use to enhance the ranking because a web page with more incoming links can be more more important than others. But this is not true for typical link pages like menues. Therefore the number of outgoing links is needed. - added a new solr attribute 'bf' to solr queries which is a boost function extension. this field can contain a formula which comuptes the boost according to given field values. After some experiments the following forumla is now default: div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4 This takes the number of references and the inbound links. Further experiments are needed to enhance that forumula.	12 years ago
Michael Peter Christen	34f8786508	removed dependency of vocabulary navigation from Jena and it's triplestore; the vocabulary search is now done using generic solr fields which are created on-the-fly during runtime.	12 years ago
Michael Peter Christen	9319b90d8a	- fixes for host navigation - fixes for filetype navigation - removed unused code	12 years ago
Michael Peter Christen	cb5cbec14d	distinguishing modified query string and original query string	12 years ago
Michael Peter Christen	8aa08261a7	update to Solr Boost handling	12 years ago
Michael Peter Christen	72f165d58b	added a Boost class which stores solr query boost values. The class can be configured using the yacy.init file. The boost information is taken from the configuration each time when a query to solr is done.	12 years ago
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	12 years ago
Michael Peter Christen	d48e9788d2	enhanced search result processing behavior - query less at one time; query more often - in between the small queries, evaluate results - remove fields from search results which are not needed	12 years ago
reger	469efcdb9d	fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead) (leave hosts, topics and not in ConfigPortal included filetype, protocoll navigator untouched)	12 years ago
orbiter	ee612e8b93	start the local search only if this peer is doing a remote search or when it is doing a local search and the peer is old	12 years ago
Michael Peter Christen	4eab3aae60	removed overhead by preventing generation of full search results when only the url is requested	12 years ago
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	46be4af5b9	Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'	12 years ago
Michael Peter Christen	952e143580	FINALLY YaCy can now search for full strings using double- or singlequoted strings in the search query line!!!	12 years ago
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	12 years ago
Michael Peter Christen	d64445c3cb	because we have the inurl:<term> - searchmodifier, we don't actually need regular expressions as search attributes. They had now been removed from the advanced search page while they are still created internally. The filter is then expressed against solr as regular expression filter query. If the expression points out a selection of an specific protocol, host or filetype this is then translated into a facetted query.	12 years ago
cominch	d2a94cc55e	refactor package	12 years ago
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	12 years ago
Michael Peter Christen	842faf96a2	fixed media search	12 years ago
Michael Peter Christen	93001586a0	removed warnings, removed too-fast pausing of crawls	12 years ago
Michael Peter Christen	8041742e48	added matching of path to query pattern	12 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	12 years ago
Michael Peter Christen	158732af37	automatically delete entries from the crawl profile list if crawl is terminated.	12 years ago
Michael Peter Christen	2371ef031c	added solr faceted search support to YaCy search results added solr highlighting / YaCy snippets to YaCy search results - facets are now much more complete - facets are computed and searched much faster - snippet computation is done by solr if solr knows the snippet	12 years ago
Michael Peter Christen	619bf7e875	fixed filetype modified for media types in text search	12 years ago
Michael Peter Christen	8fb370d9f8	renovated the way how search results are count. should be correct now...	12 years ago
Michael Peter Christen	b764de424a	code cleanup	12 years ago
Michael Peter Christen	1168d09de8	more refactoring - integrated the code of SnippetProcess into SearchEvent	12 years ago
Michael Peter Christen	6629e37685	tried to clean up the search process mess	12 years ago
Michael Peter Christen	c5f67a5d6d	fixed a problem with local search from solr results: now all results from solr are shown (again)	12 years ago
orbiter	276dd6452b	removed warnings	12 years ago
Michael Peter Christen	ce0e5b1e17	- more refactoring / private methods - fix for usage of custom solr field names	12 years ago
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	12 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	12 years ago
Michael Peter Christen	5d16c23a1f	specified more URIMetadata as URIMetadataNode	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	36c13ed15b	less solr prefetch	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	12 years ago
orbiter	4fed4a86d8	another fix to location search	12 years ago
orbiter	0f7a54452d	fix for location search query encoding	12 years ago
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	12 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	12 years ago
Michael Peter Christen	5ac61591f3	better abstraction for solr query params	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	12 years ago
Michael Peter Christen	872f83ebe0	refactoring	12 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	12 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	12 years ago
Michael Peter Christen	4d29f59a27	removed warnings	12 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	12 years ago
Michael Peter Christen	75d5e3475d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
cominch	dc468dad01	add content control features for custom filter lists	12 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	12 years ago
Michael Peter Christen	4521d63c92	added boosts to solr search queries	12 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	12 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	12 years ago
Michael Peter Christen	ab6ec4ec52	added snippet computation to solr/rss and gsa result writer	12 years ago
Michael Peter Christen	653645c1cf	corrected solr query syntax	12 years ago
Michael Peter Christen	a049761e0c	fixed double-check	12 years ago
Michael Peter Christen	f42a57cd7d	gsa format update	12 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	12 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	12 years ago
Michael Peter Christen	f00733186b	code simplifications	12 years ago
Michael Peter Christen	db0d438709	fix for http://bugs.yacy.net/view.php?id=206	12 years ago
orbiter	404b0aab09	refactoring in remote search and stub for remote node peer selection	12 years ago
orbiter	99ef57f103	reduced sleep times	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	40c0856489	refactoring	12 years ago
Michael Peter Christen	06a78eecb7	code simplification	12 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	12 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	136fcb1ad9	refactoring	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	ef488a15f7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
sixcooler	76b037a20a	check content domain fix: search image/media should not show pages containing image/media search text should show all/text but image/media	12 years ago
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	12 years ago
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	12 years ago
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	c00a3cf74d	less usage of generic logger to avoid logger generation overhead	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	c7afa8bc48	using SwitchboardConstants for solr attributes	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	64c0268b2b	show triplestore metadata in yacydoc and viewfile	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	5fc6524ca8	- moved triple store to net.yacy.cora.lod (should be generalized there later - added abstract add, delete, get methods in the triplestore - added generation of triples after auto-annotation - migrated all MultiProtocolURI objects to DigestURI in the parser since the url hash is needed as subject value in the triples in the triple store	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
cominch	65c5826d93	bugfix Conflicts: source/net/yacy/document/parser/augment/AugmentParser.java	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	ab7107b34b	fixed RWIProcess queue limits: now discovering hidden results for mass result retrieval	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	2fe207f813	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	5e562dcdb7	adopted vocabulary usage within anotation/naviagtion feature of search to new SimpleVocabulary class	13 years ago
Michael Peter Christen	240045cf7c	fix for bad distance computation	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	834dc6b263	store more data from interface access	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	7bf421b9dd	- fixed image search page navigation - removed some deadlocks and ConcurrentModificationExceptions during DidYouMean collection	13 years ago
Michael Peter Christen	c6558cba08	more classification bugs	13 years ago
Michael Peter Christen	082831b9d6	search contentdom was checked in wrong way - fixed	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	52d307c735	prevent that the snippet fectch process removes catchall entries	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
reger	b2175ea4ef	Add possibility to set custom Solr field names for the YaCy default Solr attributes. - Changing the format of YaCy's solr.key.list while maintainig backward compatibility Federated index config screens adjusted accordingly - modified the Solr update request to use a 3 min Solr autocommit intervall	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Peter Christen	e2f8f263e8	changed storage of search words: keep order	13 years ago
Michael Peter Christen	2ea585d616	fix for host navigator	13 years ago
Michael Peter Christen	41536eb4a2	performance hack	13 years ago
Michael Peter Christen	f91487fc50	added delete-button for host navigation	13 years ago
Michael Peter Christen	e8d24fd802	author navigator can be switched off	13 years ago
Michael Peter Christen	558ab7bd4e	made the protocol navigator reversible	13 years ago
Michael Peter Christen	96cb75f1d4	made the filetype navigator be able to deselect the search constraint	13 years ago
Michael Peter Christen	4eff0e26f1	npe bugfix	13 years ago
Michael Peter Christen	1a0b6b3913	get more navigation details to search results	13 years ago
Michael Peter Christen	83009d86f7	added the vocabulary navigator. It can be very simply tested by switching on the locale dictionaries.	13 years ago
Michael Peter Christen	c602eaaf46	enhanced search process	13 years ago
Michael Christen	eff966f396	fix for search process (it was aborted too early during remote search)	13 years ago
Michael Christen	585a8f3c44	fixed a bug in search sequence (caused emtpy results)	13 years ago
Michael Christen	52184a1170	fix for search process	13 years ago
Michael Christen	0797b0de99	new handling of remote search processes: looking for seeds will now not block the whole search process any more. A deadlock with a DHT selection process may have been the cause for interface lockings in the past.	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	e9dc99fe15	added rules to set specific RWIs as private RWIs which are not transmitted to remote peers. This will be used for private index copies and phonetic indexes.	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
Michael Christen	3eccdca63c	protection against too long running snippet fetch processes	13 years ago
Michael Christen	86b3385847	fixed a deadlock during secondary remote search	13 years ago

... 3 4 5 6 7 ...

468 Commits (5db97a892856265737f67978a5175c36c46e5cee)