yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Marc Nause	ac478384d3	*) did some long overdue refactoring	12 years ago
Michael Peter Christen	ada3f27de7	added three new field for a better ranking: references_internal_i, references_external_i and references_exthosts_i. These can be used to count and evaluate the number of external links to every web page. An experimental ranking function can be i.e.: div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1))	12 years ago
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	12 years ago
Michael Peter Christen	a20941c067	resume paused crawls on startup; user expects that restarts 'heal' everything	12 years ago
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	12 years ago
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	12 years ago
Michael Peter Christen	cf0acd2cb4	upgrade to solr 4.2.1	12 years ago
reger	e89491271f	- fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down)	12 years ago
reger	6a9d0b60a3	make sure configured port is reported on recreated mySeed.txt	12 years ago
Michael Peter Christen	870aedf3c6	fixes for better search interface integration in yaml templates	12 years ago
Michael Peter Christen	5512be6673	fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion.	12 years ago
Michael Peter Christen	342ba1049b	- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.	12 years ago
orbiter	65d73e5652	renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery	12 years ago
orbiter	17ae51e741	increased number of links limitation from 1000 to 10000 for rss feeds and html documents	12 years ago
orbiter	e4d26d1cb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	12 years ago
reger	d57b221921	add: reset Solr schema filed selection to default button in IndexSchema_p	12 years ago
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	12 years ago
Michael Peter Christen	16e9d4d1dd	added a restart hint	12 years ago
Michael Peter Christen	b3a54d5b1c	fix for wrong class name in log	12 years ago
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	12 years ago
Michael Peter Christen	4af0839be2	use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used	12 years ago
Michael Peter Christen	b8ed66a55d	added all clickdepth computations for source and target paths in webstructure core	12 years ago
Michael Peter Christen	6300730d7f	refactoring of clickdepth computation as preparation for clickdepth computation of webgraph links	12 years ago
Michael Peter Christen	2080fc7406	removed unused tag fields	12 years ago
reger	230a12bfe2	adjust Opensearch discover function to new webgraph Solr schema	12 years ago
orbiter	6b13dd0d3d	added clickdepth field writing for webgraph core (unfinished)	12 years ago
orbiter	47114910d5	fix for possible memory leaks	12 years ago
Michael Peter Christen	addba047e2	changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking	12 years ago
reger	38f46eb33d	set RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries)	12 years ago
reger	2962f2b9e9	Merge branch 'master' of git://gitorious.org/yacy/rc1.git	12 years ago
orbiter	ab74d559fb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	4490133909	removed target_tag_s (superfluous)	12 years ago
orbiter	cd197bb555	fix for NPE if surrogates do not exist	12 years ago
reger	6ae30f9d0f	replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter	12 years ago
Michael Peter Christen	252bb51f98	fix for wrong mime type in noload crawler	12 years ago
Michael Peter Christen	25300913fa	fixes to search debugging after testing with the different search debugging options	12 years ago
Michael Peter Christen	81380ae5c8	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	c2fde018b5	concurrent snippet fetching from solr results which do not have snippets	12 years ago
orbiter	b1140e3d82	added debug switches for detailed search testing	12 years ago
orbiter	cdbfddf091	added filter queries for better image, audio and video results	12 years ago
Michael Peter Christen	587ef83eab	added missing cleanup statements for short memory cases during search	12 years ago
orbiter	2562f052b9	do not put the fulltext field text_t into the search cache because it is not used there and uses a lot of memory	12 years ago
Michael Peter Christen	2b6c79d347	in method exists() also use the new caching-stacks for documents/metadata	12 years ago
Michael Peter Christen	ae734b3f8d	enhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again)	12 years ago
Michael Peter Christen	0d7b4bc891	better protection against OOM during search flush and fixed missing result push	12 years ago
Michael Peter Christen	221ed7d764	- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters	12 years ago
Michael Peter Christen	3b1d9dc884	made index storage from DHT search result concurrently. This prevents blocking by high CPU usage during search. Also: removed query from Solr for DHT search results; results are taken from the pending queue.	12 years ago
orbiter	f13c0b2abd	fix for search	12 years ago
orbiter	0f7ea7ad9f	- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)	12 years ago
Michael Peter Christen	f327ffedb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	9c09fd7d0b	better/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages.	12 years ago
Michael Peter Christen	840fa22135	disabled clickdepth computation during craling since that is repeated during clean-up phase.	12 years ago
orbiter	d74472f562	corrected result counter	12 years ago
orbiter	2555542f7a	removed the dns prefetch because that was not soo useful	12 years ago
Michael Peter Christen	d957739441	removed size request	12 years ago
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	12 years ago
Michael Peter Christen	35fa718b77	testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js	12 years ago
Michael Peter Christen	008288719c	fix for schema export to consider also automatically generated coordinate fields	12 years ago
Michael Peter Christen	089dee1770	- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging	12 years ago
Michael Peter Christen	c16de49f64	fix for webgraph delete query	12 years ago
Michael Peter Christen	56d5946a59	- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.	12 years ago
Michael Peter Christen	14cceb6b17	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/IndexFederated_p.html source/net/yacy/cora/federate/solr/YaCySchema.java source/net/yacy/peers/Protocol.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/Segment.java also moved portalsearch-dev to yacy-portalsearch to be able to fix problems with new attachment to solr of the search widget	12 years ago
Michael Peter Christen	58e1e6fa2b	fixes to schema	12 years ago
reger	f291d60c5f	on remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
Michael Peter Christen	91a0401d59	introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema	12 years ago
Michael Peter Christen	33bc255e85	prevent that crawl starts with very large url lists cause a time-out in the user front-end	12 years ago
Michael Peter Christen	b6de1f42dc	Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.	12 years ago
Michael Peter Christen	4111606654	removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.	12 years ago
Michael Peter Christen	c20fa3640d	fix to unbalanced tag and license for null objects	12 years ago
Michael Peter Christen	3a6097966d	added jsonp option to yjson result writer	12 years ago
Michael Peter Christen	de58043205	Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.	12 years ago
Michael Peter Christen	d3508fa8ff	fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html	12 years ago
Michael Peter Christen	1db23e9eac	Moved methods from SolrServerConnector to AbstractSolrConnector with the result that most of these methods become superfluous in other classes. This is a generalization step towards multi-indexes in Solr.	12 years ago
Michael Peter Christen	16d90859b7	reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key	12 years ago
Michael Peter Christen	0d888ff69e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	12 years ago
reger	c37d718f16	make sure yacy.running is deleted if not running (catch exception) - to prevent following log if YaCy was previously not properly shutdown E ... STARTUP WARNING: the file C:\src\git\yacy-rc1\DATA\yacy.running exists, this usually means that a YaCy instance is still running E ... STARTUP FATAL ERROR: java.util.concurrent.TimeoutException java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException at net.yacy.cora.protocol.TimeoutRequest.call(TimeoutRequest.java:91) at net.yacy.cora.protocol.TimeoutRequest.ping(TimeoutRequest.java:112) at net.yacy.yacy.startup(yacy.java:200) at net.yacy.yacy.main(yacy.java:638) Caused by: java.util.concurrent.TimeoutException - adjust Netbeans path (to solr4.1.jars)	12 years ago
Michael Peter Christen	762b687e47	extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests.	12 years ago
Michael Peter Christen	d70d99fab5	added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format.	12 years ago
Michael Peter Christen	6a4878940b	fix in html parser and bookmark generation	12 years ago
Michael Peter Christen	dee8b24d3c	better error handling for bookmarks	12 years ago
Michael Peter Christen	e1da39245a	when searching the network, do not search on robinson peers with the old DHT search interface. Now use the solr interface.	12 years ago
Michael Peter Christen	6f6ddaf7e7	A robinson peer does not need to write RWI data if such peers are only searched using the solr interface. Searching public rpbinsons will be done with solr only in the future.	12 years ago
Michael Peter Christen	ab4f74c82c	fix for xml blacklist import	12 years ago
Michael Peter Christen	7806680ab8	fixed a problem with re-feeding of already indexed documents whith coordinates attached.	12 years ago
Michael Peter Christen	cb38e860cf	After the observation that Windows user simply forget that they started YaCy; YaCy is still running and the user additionally expect that another doubleclick on the YaCy icon simply opens the search windows (again) I decided to add a function that complies to the expectation to the user: simply open the browser pop-up page again if the user starts YaCy while YaCy is still running.	12 years ago
Marc Nause	27894d2c1a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	12 years ago
Marc Nause	75f9568472	) only install files from the RELEASE directory ) minor changes	12 years ago
Michael Peter Christen	eb80405a16	added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer	12 years ago
Michael Peter Christen	19c46e4acf	catch more exceptions	12 years ago
Michael Peter Christen	7de502f43d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Marc Nause	3bc5ee6e3d	*) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore)	12 years ago
Michael Peter Christen	4f270d89e2	another NPE	12 years ago
Michael Peter Christen	921091c3a6	use thread-safe http connection manager for authenticated remote solr connections	12 years ago
Michael Peter Christen	e8f7b85b98	fixes to internal RWI usage if RWI is switched off (NPE etc)	12 years ago
Michael Peter Christen	3834829b37	bugfixes and more logging for solr connector	12 years ago
Michael Peter Christen	80fe3d7860	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java	12 years ago
Michael Peter Christen	4323621a76	update to Solr 4.1.0	12 years ago
reger	160ce568b3	move testing SolrServlet.main to test, making include of jetty.jar in distribution and classpath obsolete - move jetty.jar to test library - move SolrServlet.main as is to test, add also a junit test simulating main - add build.xml cleanup for EmbeddedSolrConnectorTest created test/DATA - adjust some test compile errors	12 years ago
orbiter	07a20e8253	removed unused import	12 years ago
Michael Peter Christen	d1cb4cbc84	enhanced network scanner, is faster and more flexible now - start more processes - remove superfluous host name resolution - better/more flexible subnet ip range calculation - prefer ipv4 makes better usable ip pre-settings in servlet - extended servlet by new subnet /20 - option - redesign of scanner start process in servlet (generalization)	12 years ago
Michael Peter Christen	592adf7ccb	fix for domain navigation	12 years ago
Michael Peter Christen	4ca1b76627	less search overhead when first result set is smaller than requested	12 years ago
Michael Peter Christen	f748b0aa7c	NPE fix	12 years ago
Michael Peter Christen	7dfcc92b71	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	0b6566a389	optimizations when starting large crawl requests with many start urls in one request: - allow larger match-fields in html interface - delete all host hashes at once from zurl - when deleting by host, do not count size of deleted entries since that was the reason it took so long	12 years ago
orbiter	a2160054d7	ability to create vocabularies also without any objectspace: this iterates over all urls in the index do create terms	12 years ago
orbiter	ecc10a752c	fixes to index enumeration for vocabulary production	12 years ago
sixcooler	3a13906121	clear some more caches if running out of memory	12 years ago
Michael Peter Christen	8651ec35fe	turned author_s into the multi-valued field author_sxt	12 years ago
Michael Peter Christen	4589afe056	fix NPE when solr does not deliver snippets	12 years ago
Michael Peter Christen	0fe7b6fd3b	migrated the index export methods from the old metadata to solr. Now exports are done using solr queries. removed superfluous methods and servlets.	12 years ago
Michael Peter Christen	1768c82010	removed field selection because that created documents with that field only which was not useful when re-writing the same document	12 years ago
Michael Peter Christen	31e854bef6	Merge remote-tracking branch 'copro/master'	12 years ago
Michael Peter Christen	4735bd47f4	- changed solr commit call and added an optimize option. Since Solr 4.0.0 there is a new softcommit feature which implements a near-real-time (NRT) search option. The softcommit does not do IO and does not cause performance issues. YaCy has now an extension in its solr connectors to use the softcommit feature. The softcommit call now replaces all places where a hard commit was used. Furthermore the commit strategy in when doing a search from the web interface was changed (it's done every time before a search is done). The softcommit feature was implemented because it was needed for the following changes (customer demands), which is also included in this git commit: - added a feature to identify all documents which have unique titles and/or unique descriptions. These unique flags are disabled by default. - added also a feature to set a flag when the url from a canonical tag is equal to the document url. This is also disabled by default. To support the new softcommit strategy, the commitWithinMs option was set to -1 do disable automatic commit based on document insert times. If documents are inserted permanently then also a commit would happen permanently whenever the commitWithinMs time is reached. This would conflict with the regular autocommit of 10 minutes and the new softcommit strategy.	12 years ago
Copro	3ea8380959	Adding Vimeo tag to wiki commands to embedd Video video with id	12 years ago
Copro	ee9d7fd93d	Added feature to embedd Youtube videos to wiki commands for usage in Wiki, Blog or other servlets	12 years ago
Michael Peter Christen	9ccdd21d76	Merge remote-tracking branch 'aleksejs/fixtrans' Conflicts: locales/ru.lng Tried to merge this but I had to made this 'blind'. Sorry if I deleted something that was right.	12 years ago
Michael Peter Christen	db024a4e19	added new solr fields (unused yet; implementation will follow)	12 years ago
Michael Peter Christen	f5fd2aea18	removed archaic migration code	12 years ago
Michael Peter Christen	60f2a69331	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	cba038f97b	one more NPE fix	12 years ago
sixcooler	f3e705c4fe	bump to httpclient / httpcore 4.2.3 (bugfix-release)	12 years ago
Michael Peter Christen	af465cdca5	fix for wrong robots.txt loading for https protocol see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4579	12 years ago
Michael Peter Christen	c3d50d91f8	relaxing site operator for www prefix: - when using a site operator search for a domain where the domain has a www prefix, also the domain without the www is enclosed - when using a site operator search for a domain where the domain has no www prefix, also the domain with the www in enclosed - in the host navigator, all domains with and without a www prefix are accumulated. That means that the host navigator does never show a host with a www prefix. This should prevent usage mistakes of the site operator.	12 years ago
Michael Peter Christen	db49e91724	fixed a NPE which may appear for freeworld peers without any rwi index data. This the NPE looked like: Caused by: java.lang.NullPointerException at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:279) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:155) at search.respond(search.java:314) ... 12 more	12 years ago
Michael Peter Christen	4faa07c214	added a timeout for topic computation (solr is here much slower than the old metadata-db)	12 years ago
Michael Peter Christen	d2d5be032d	added a 'inlink' search option according to the suggestion in the YaCy forum at http://forum.yacy-websuche.de/viewtopic.php?f=18&t=4572#p27410 The feature was not called 'haslink' but called 'inlink' to have a analogous naming like 'inurl'. This causes now that you can search for words in links of the document, like: * inlink:yacy searches all documents which link to pages which have an 'yacy' in the url.	12 years ago
reger	3897bb4409	added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index) - migrates all entries in old urldb Metadata coordinate (lat / lon) NumberFormatException still relative often (see excerpt below), - added try/catch for URIMetadataRow (seems not to be needed in URIMetaDataNode, as Solr internally checks for number format) - removed possible typ conversion for lat() / lon() comparison with 0.0f, changed to 0.0 (leaving it to the compiler/optimizer to choose number format) current log excerpt for NumberFormatException: W 2013/01/14 00:10:07 StackTrace For input string: "-" java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152) ... Caused by: java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152)	12 years ago
reger	3b6e08b49f	prevent checking of urldb if empty - disconnect urlIndexFile if empty - add missing lock class in submenuSearchConfiguration	12 years ago
reger	f143804382	fix configuration for search page navigators - added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page) - currently redundant setting with part of ConfigPortal page - added missing config for filetype and protocol navigator - adjusted init of SearchEvent to check navigation config setting - renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)	12 years ago
Michael Peter Christen	becd52a984	added also a re-calculation of reference counts during the post-processing of clickcount calculations. This is a really nice thing to have because the reference count affects ranking.	12 years ago
Michael Peter Christen	38d3feae65	added separate delete commands for the local+remote solr index, the old metadata and old rwi and for the citation index. The important advancement is the separation of the citation index deletion because that index is responsible for the linkdepth calculation. Now a search index can be deleted without the citation index and that should cause that less clickdepths must be post-processed.	12 years ago
Michael Peter Christen	6f0baaa309	added the clickdepth post-processing: some links may have 'shortcuts' to already calculated click depths. There are then calculated if the crawl buffer is empty and therefore no new 'shortcuts' can be discovered. The status of the clickdepth stack (to-be-processed) can be seen using a solr search command like this: http://localhost:8090/solr/select?q=process_sxt:[%20TO%20]&start=0&rows=30&fl=sku,clickdepth_i,process_sxt	12 years ago
Michael Peter Christen	0f5b6f38c1	enhanced root-url detection	12 years ago
Michael Peter Christen	5c0c56cfe1	Preparations to produce a click depth attribute in the search index. This attribute can be used for ranking and for other purpose (demand by customer) The click depth is computed in two steps: - during indexing the current fill-state of the reverse link index is used to backtrack the current page to the root page. The length of that backtrack is the clickdepth. But this does not discover the shortest click depth. To get this, a second process to check again is needed - added a process tag that can be used to do operations on the existing index after a crawl; i.e. calculation the shortest clickpath. Added a field to control this operation but not a method to operate on this. - added a visualization of the clickpath length in the host browser	12 years ago
Michael Peter Christen	6861af87e2	removed warnings	12 years ago
Michael Peter Christen	295884fd54	- Merge commit '168b1d130d9d67b5e8855a0b50c4ba7ad4a416f8' - fixed conflict in htroot/yacysearch.java - removed nedres check because that causes that the remote server is not called at all in most cases (local index has already results but we want more) - fixed a regex bug (a '=' too much)	12 years ago
reger	276e63401e	small sanitary fixes - exclude unix shell scripts in NSIS windows install archive - replace link to env/grafics/yacy.gif to yacy.png (build.nsi) - remove unused code lines (Blacklist_p, Response, WordReferenceVars) - type & xhtml (RankingSolr_p.html)	12 years ago
reger	f301336adf	fix: no results with configuration citation reference index switched off - urlcitationindex != null check added to ResultEntry.referencesCount - plus other places where conflicting procedure was used (and urlcitationindex not already checked != null)	12 years ago
orbiter	fe50702eb0	added a filterscannerfail attribute to QueryParams which causes that a check to the network scanner fail/success status can be used/suppressed for search results. This is a feature that comes with the port scanner.	12 years ago
reger	168b1d130d	Adding heuristic to get search results from configured systems which support opensearch specification - any system supporting opensearch specification can be configured - search query is only forwarded to remote system if not enough results available on local peer - discover function provided, checking the local Solr index for links to opensearchdescription files, to add to the config - sample config file with some general search engines with opensearch support	12 years ago
Michael Peter Christen	eb90d38cd7	added missing extension 'mkv' for navigation	12 years ago
Michael Peter Christen	95712fdc8b	update to pdf parser	12 years ago
Michael Peter Christen	4a9182ae16	use the search configuration to default the cacheStrategy to the value as given in the search configuration	12 years ago
Michael Peter Christen	98819ec3d9	use solr boost configuration to select search fields. At this time it is possible to enter a negative boost value to switch that value off. This might be different in the future with a better input interface.	12 years ago
Michael Peter Christen	e1f89efd0d	- made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish	12 years ago
Michael Peter Christen	8f3bd0c387	fix for smb crawl situation (lost too many urls)	12 years ago

1 2 3 4 5 ...

6327 Commits (b85db72a73da4797d09dc72155c56ca00dd5da0f)