yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	12 years ago
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	12 years ago
orbiter	2c3b024196	if the crawl was paused (automatically), show the reason for pausing in the Crawler_p servlet.	12 years ago
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	12 years ago
reger	40b3f2c5fe	comment out dead menue link	12 years ago
reger	bf1e1ddca1	fix typo in prev commit	12 years ago
reger	d4d93be779	uncomment "used time" calculation for remote search log	12 years ago
reger	36202f27b0	improve remote search log, set "Returned Results" to transmitcount (instead of no value)	12 years ago
reger	254074b11d	Merge branch 'master' of git://gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	870aedf3c6	fixes for better search interface integration in yaml templates	12 years ago
Michael Peter Christen	735eb70525	better search timing; prevents '0 results' for very large local indexes >> 10 mio documents	12 years ago
Michael Peter Christen	342ba1049b	- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.	12 years ago
reger	31d16f20d7	fix invisible icon not found	12 years ago
orbiter	243b66ae6d	Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy	12 years ago
Frank	7763f2554f	add the new PPMbar in Crawler_p for a better style and better use.	12 years ago
orbiter	e4d26d1cb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	12 years ago
reger	d57b221921	add: reset Solr schema filed selection to default button in IndexSchema_p	12 years ago
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	12 years ago
Michael Peter Christen	d725782440	turned severe message to warning message about network failure events	12 years ago
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	12 years ago
Michael Peter Christen	2080fc7406	removed unused tag fields	12 years ago
reger	7804c12976	fix error msg in ConfigHeuristics_p	12 years ago
reger	230a12bfe2	adjust Opensearch discover function to new webgraph Solr schema	12 years ago
orbiter	47114910d5	fix for possible memory leaks	12 years ago
Michael Peter Christen	addba047e2	changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking	12 years ago
Michael Peter Christen	68e739a90b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	3d9ce9cd04	- added more selection criteria for network seed list - enhanced up script	12 years ago
orbiter	168e8d9b4d	added/fixed missing DOCTYPE line (submitted by Thomas)	12 years ago
Michael Peter Christen	25300913fa	fixes to search debugging after testing with the different search debugging options	12 years ago
Michael Peter Christen	2d472a39f4	DHT-transferred metadata and crawl receipts now also use the delayed search cache to prevent that too much IO load is on the peer during search.	12 years ago
Michael Peter Christen	221ed7d764	- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters	12 years ago
Marc Nause	2714b59f38	*) For some reason this seems to fix a ClassCastException on my system (OpenJDK).	12 years ago
orbiter	0f7ea7ad9f	- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)	12 years ago
orbiter	7ff10bdb1b	fix of page navigation for formatted totalcount numbers	12 years ago
orbiter	a734fbc4a5	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d74472f562	corrected result counter	12 years ago
orbiter	aa3c26c62e	added recrawl/reload to CrawlStartSite for a timeout of 3 days	12 years ago
orbiter	c1b7e61882	added option to create empty vocabularies	12 years ago
bubu	e0edad689d	fix link to IndexSchema_p.html	12 years ago
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	12 years ago
Michael Peter Christen	35fa718b77	testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js	12 years ago
Michael Peter Christen	008288719c	fix for schema export to consider also automatically generated coordinate fields	12 years ago
Michael Peter Christen	089dee1770	- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging	12 years ago
Michael Peter Christen	56d5946a59	- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.	12 years ago
Michael Peter Christen	14cceb6b17	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/IndexFederated_p.html source/net/yacy/cora/federate/solr/YaCySchema.java source/net/yacy/peers/Protocol.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/Segment.java also moved portalsearch-dev to yacy-portalsearch to be able to fix problems with new attachment to solr of the search widget	12 years ago
Michael Peter Christen	58e1e6fa2b	fixes to schema	12 years ago
reger	d31a109efe	remove obsolete Solr "commit within" input field from IndexFederated see `4111606654`	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
Michael Peter Christen	89ede0fe84	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	91a0401d59	introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema	12 years ago
orbiter	594ed63f2a	fixed interactive search which caused an error if pubDate is not present in a search result	12 years ago
Michael Peter Christen	98a4a4aa97	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	b6de1f42dc	Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.	12 years ago
Marc Nause	efb6cf7d21	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	12 years ago
Marc Nause	ce5b7afab2	) removed Skype online indicator (was not working anymore) ) updated ICQ URLs	12 years ago
Michael Peter Christen	4111606654	removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.	12 years ago
Michael Peter Christen	c20fa3640d	fix to unbalanced tag and license for null objects	12 years ago
Michael Peter Christen	3a6097966d	added jsonp option to yjson result writer	12 years ago
Michael Peter Christen	de58043205	Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.	12 years ago
Michael Peter Christen	d3508fa8ff	fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html	12 years ago
Michael Peter Christen	02fa31b5bf	better filesearch layout	12 years ago
Michael Peter Christen	e55ec3071d	reduced number of facets in yacyinteractive (only filetype necessary)	12 years ago
Michael Peter Christen	16d90859b7	reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key	12 years ago
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	12 years ago
Michael Peter Christen	762b687e47	extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests.	12 years ago
Michael Peter Christen	d70d99fab5	added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format.	12 years ago
Michael Peter Christen	51e7ab4f70	moved bookmarks back to more prominent location (even if this does not fit to the 'Search Interfaces' headline)	12 years ago
Michael Peter Christen	dee8b24d3c	better error handling for bookmarks	12 years ago
Marc Nause	27894d2c1a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	12 years ago
Marc Nause	75f9568472	) only install files from the RELEASE directory ) minor changes	12 years ago
Michael Peter Christen	eb80405a16	added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer	12 years ago
Michael Peter Christen	1e3d8cc235	show a link for the host in the host browser; see	12 years ago
Michael Peter Christen	7de502f43d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Marc Nause	3bc5ee6e3d	*) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore)	12 years ago
Michael Peter Christen	3834829b37	bugfixes and more logging for solr connector	12 years ago
Michael Peter Christen	d1cb4cbc84	enhanced network scanner, is faster and more flexible now - start more processes - remove superfluous host name resolution - better/more flexible subnet ip range calculation - prefer ipv4 makes better usable ip pre-settings in servlet - extended servlet by new subnet /20 - option - redesign of scanner start process in servlet (generalization)	12 years ago
Michael Peter Christen	7dfcc92b71	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	0b6566a389	optimizations when starting large crawl requests with many start urls in one request: - allow larger match-fields in html interface - delete all host hashes at once from zurl - when deleting by host, do not count size of deleted entries since that was the reason it took so long	12 years ago
orbiter	a2160054d7	ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms	12 years ago
Michael Peter Christen	be27567b53	allow more links when starting a crawl by file	12 years ago
reger	3777b338c7	bugfix: location url for migrate urldb button onclick	12 years ago
reger	8447814a31	correct headermenue in migrateurldb_p.html - update NetBeans project path	12 years ago
Michael Peter Christen	99185d7048	one more fix for author_sxt	12 years ago
Michael Peter Christen	b6ae6262f6	- add the copyField author_sxt only if author exists - set the solr default search field according to existing fields	12 years ago
Michael Peter Christen	088373b4ea	catch exception if solr connection change fails	12 years ago
Michael Peter Christen	e23a596c1d	added a copyField for author_sxt for automated schema generation	12 years ago
Michael Peter Christen	f1a4feda3e	security fix for suggest (don't let users ask for too much)	12 years ago
Michael Peter Christen	244b157299	fix for external solr schema definition	12 years ago
Michael Peter Christen	0fe7b6fd3b	migrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets.	12 years ago
Michael Peter Christen	8eebeea533	fix for search result link in ViewFile	12 years ago
Michael Peter Christen	31e854bef6	Merge remote-tracking branch 'copro/master'	12 years ago
Michael Peter Christen	4735bd47f4	- changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done). The softcommit feature was implemented because it was needed for the following changes (customer demands), which is also included in this git commit: - added a feature to identify all documents which have unique titles and/or unique descriptions. These unique flags are disabled by default. - added also a feature to set a flag when the url from a canonical tag is equal to the document url. This is also disabled by default. To support the new softcommit strategy, the commitWithinMs option was set to -1 do disable automatic commit based on document insert times. If documents are inserted permanently then also a commit would happen permanently whenever the commitWithinMs time is reached. This would conflict with the regular autocommit of 10 minutes and the new softcommit strategy.	12 years ago
Copro	0025983993	Fix typo embedd -> embed	12 years ago
Copro	3ea8380959	Adding Vimeo tag to wiki commands to embedd Video video with id	12 years ago
Copro	ee9d7fd93d	Added feature to embedd Youtube videos to wiki commands for usage in Wiki, Blog or other servlets	12 years ago
Michael Peter Christen	9ccdd21d76	Merge remote-tracking branch 'aleksejs/fixtrans' Conflicts: locales/ru.lng Tried to merge this but I had to made this 'blind'. Sorry if I deleted something that was right.	12 years ago
Michael Peter Christen	aa067da86b	set the 'all' option as option at end of the list because the all option currently select also lists which cannot be exported in xml correctly	12 years ago
Michael Peter Christen	edbc86d2b0	integrated search term into opensearch result title. this makes better bookmark names when subscribing multiple search results from the same peer	12 years ago
Michael Peter Christen	4faa07c214	added a timeout for topic computation (solr is here much slower than the old metadata-db)	12 years ago

1 2 3 4 5 ...

4316 Commits (179d0321818e090ebe81d0330f6d0433722887e8)