yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	3897bb4409	added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index) - migrates all entries in old urldb Metadata coordinate (lat / lon) NumberFormatException still relative often (see excerpt below), - added try/catch for URIMetadataRow (seems not to be needed in URIMetaDataNode, as Solr internally checks for number format) - removed possible typ conversion for lat() / lon() comparison with 0.0f, changed to 0.0 (leaving it to the compiler/optimizer to choose number format) current log excerpt for NumberFormatException: W 2013/01/14 00:10:07 StackTrace For input string: "-" java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152) ... Caused by: java.lang.NumberFormatException: For input string: "-" at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525) at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279) at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277) at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329) at transferURL.respond(transferURL.java:152)	12 years ago
reger	f143804382	fix configuration for search page navigators - added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page) - currently redundant setting with part of ConfigPortal page - added missing config for filetype and protocol navigator - adjusted init of SearchEvent to check navigation config setting - renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)	12 years ago
orbiter	fe50702eb0	added a filterscannerfail attribute to QueryParams which causes that a check to the network scanner fail/success status can be used/suppressed for search results. This is a feature that comes with the port scanner.	12 years ago
Michael Peter Christen	eb90d38cd7	added missing extension 'mkv' for navigation	12 years ago
Michael Peter Christen	4a9182ae16	use the search configuration to default the cacheStrategy to the value as given in the search configuration	12 years ago
Michael Peter Christen	e1f89efd0d	- made image search in interactive search using the ViewImage servlet - that enables viewing of images for intranet SMB servers. - added a filter search for protocol, tld and ext again; otherwise p2p search produces a lot of rubbish	12 years ago
Michael Peter Christen	433143ba40	removed protocol, tld, ext from the urlmask and created specific navigation field for these	12 years ago
Michael Peter Christen	84f82541e8	search process enhancements	12 years ago
Michael Peter Christen	02020b590b	- removed all extension types from extension navigation which are not proper/known - automatically show the protocol navigation if there is more than http and https - automatically show the extension navigation if there is some media content	12 years ago
Michael Peter Christen	01200f06cc	using the author field as solr-native facet. this makes it necessary to introduce a copy-field for the author field to be copied to a string field. This field is then used to generate facets. Without this field, the facet would consist only of the words of the author names, not of the full author string.	12 years ago
Michael Peter Christen	bab573361f	- using a filter query for facet restriction - calculating the whole search result in at most two sub-queries from solr	12 years ago
Michael Peter Christen	1052263af3	- added a new solr field references_i which stores the number of INCOMING links to the corresponding web page. This information is taken from the reverse link index (a 'little sister' of the RWI index). - this field can be of use to enhance the ranking because a web page with more incoming links can be more more important than others. But this is not true for typical link pages like menues. Therefore the number of outgoing links is needed. - added a new solr attribute 'bf' to solr queries which is a boost function extension. this field can contain a formula which comuptes the boost according to given field values. After some experiments the following forumla is now default: div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4 This takes the number of references and the inbound links. Further experiments are needed to enhance that forumula.	12 years ago
Michael Peter Christen	34f8786508	removed dependency of vocabulary navigation from Jena and it's triplestore; the vocabulary search is now done using generic solr fields which are created on-the-fly during runtime.	12 years ago
Michael Peter Christen	9319b90d8a	- fixes for host navigation - fixes for filetype navigation - removed unused code	12 years ago
Michael Peter Christen	cb5cbec14d	distinguishing modified query string and original query string	12 years ago
Michael Peter Christen	8aa08261a7	update to Solr Boost handling	12 years ago
Michael Peter Christen	72f165d58b	added a Boost class which stores solr query boost values. The class can be configured using the yacy.init file. The boost information is taken from the configuration each time when a query to solr is done.	12 years ago
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	12 years ago
Michael Peter Christen	d48e9788d2	enhanced search result processing behavior - query less at one time; query more often - in between the small queries, evaluate results - remove fields from search results which are not needed	12 years ago
reger	469efcdb9d	fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead) (leave hosts, topics and not in ConfigPortal included filetype, protocoll navigator untouched)	12 years ago
orbiter	ee612e8b93	start the local search only if this peer is doing a remote search or when it is doing a local search and the peer is old	12 years ago
Michael Peter Christen	4eab3aae60	removed overhead by preventing generation of full search results when only the url is requested	12 years ago
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	46be4af5b9	Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'	12 years ago
Michael Peter Christen	952e143580	FINALLY YaCy can now search for full strings using double- or singlequoted strings in the search query line!!!	12 years ago
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	12 years ago
Michael Peter Christen	d64445c3cb	because we have the inurl:<term> - searchmodifier, we don't actually need regular expressions as search attributes. They had now been removed from the advanced search page while they are still created internally. The filter is then expressed against solr as regular expression filter query. If the expression points out a selection of an specific protocol, host or filetype this is then translated into a facetted query.	12 years ago
cominch	d2a94cc55e	refactor package	12 years ago
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	12 years ago
Michael Peter Christen	842faf96a2	fixed media search	12 years ago
Michael Peter Christen	93001586a0	removed warnings, removed too-fast pausing of crawls	12 years ago
Michael Peter Christen	8041742e48	added matching of path to query pattern	12 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	12 years ago
Michael Peter Christen	158732af37	automatically delete entries from the crawl profile list if crawl is terminated.	12 years ago
Michael Peter Christen	2371ef031c	added solr faceted search support to YaCy search results added solr highlighting / YaCy snippets to YaCy search results - facets are now much more complete - facets are computed and searched much faster - snippet computation is done by solr if solr knows the snippet	12 years ago
Michael Peter Christen	619bf7e875	fixed filetype modified for media types in text search	12 years ago
Michael Peter Christen	8fb370d9f8	renovated the way how search results are count. should be correct now...	12 years ago
Michael Peter Christen	b764de424a	code cleanup	12 years ago
Michael Peter Christen	1168d09de8	more refactoring - integrated the code of SnippetProcess into SearchEvent	12 years ago
Michael Peter Christen	6629e37685	tried to clean up the search process mess	12 years ago
Michael Peter Christen	c5f67a5d6d	fixed a problem with local search from solr results: now all results from solr are shown (again)	12 years ago
orbiter	276dd6452b	removed warnings	12 years ago
Michael Peter Christen	ce0e5b1e17	- more refactoring / private methods - fix for usage of custom solr field names	12 years ago
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	12 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	12 years ago
Michael Peter Christen	5d16c23a1f	specified more URIMetadata as URIMetadataNode	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	36c13ed15b	less solr prefetch	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	12 years ago
orbiter	4fed4a86d8	another fix to location search	12 years ago
orbiter	0f7a54452d	fix for location search query encoding	12 years ago
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	12 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	12 years ago
Michael Peter Christen	5ac61591f3	better abstraction for solr query params	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	12 years ago
Michael Peter Christen	872f83ebe0	refactoring	12 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	12 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	12 years ago
Michael Peter Christen	4d29f59a27	removed warnings	12 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	12 years ago
Michael Peter Christen	75d5e3475d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
cominch	dc468dad01	add content control features for custom filter lists	12 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	12 years ago
Michael Peter Christen	4521d63c92	added boosts to solr search queries	12 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	12 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	12 years ago
Michael Peter Christen	ab6ec4ec52	added snippet computation to solr/rss and gsa result writer	12 years ago
Michael Peter Christen	653645c1cf	corrected solr query syntax	12 years ago
Michael Peter Christen	a049761e0c	fixed double-check	12 years ago
Michael Peter Christen	f42a57cd7d	gsa format update	12 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	12 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	12 years ago
Michael Peter Christen	f00733186b	code simplifications	12 years ago
Michael Peter Christen	db0d438709	fix for http://bugs.yacy.net/view.php?id=206	12 years ago
orbiter	404b0aab09	refactoring in remote search and stub for remote node peer selection	12 years ago
orbiter	99ef57f103	reduced sleep times	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	40c0856489	refactoring	12 years ago
Michael Peter Christen	06a78eecb7	code simplification	12 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	12 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	136fcb1ad9	refactoring	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	ef488a15f7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
sixcooler	76b037a20a	check content domain fix: search image/media should not show pages containing image/media search text should show all/text but image/media	12 years ago
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	12 years ago
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	12 years ago
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	c00a3cf74d	less usage of generic logger to avoid logger generation overhead	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	c7afa8bc48	using SwitchboardConstants for solr attributes	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	64c0268b2b	show triplestore metadata in yacydoc and viewfile	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	5fc6524ca8	- moved triple store to net.yacy.cora.lod (should be generalized there later - added abstract add, delete, get methods in the triplestore - added generation of triples after auto-annotation - migrated all MultiProtocolURI objects to DigestURI in the parser since the url hash is needed as subject value in the triples in the triple store	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
cominch	65c5826d93	bugfix Conflicts: source/net/yacy/document/parser/augment/AugmentParser.java	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	ab7107b34b	fixed RWIProcess queue limits: now discovering hidden results for mass result retrieval	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	2fe207f813	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	5e562dcdb7	adopted vocabulary usage within anotation/naviagtion feature of search to new SimpleVocabulary class	13 years ago
Michael Peter Christen	240045cf7c	fix for bad distance computation	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	834dc6b263	store more data from interface access	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	7bf421b9dd	- fixed image search page navigation - removed some deadlocks and ConcurrentModificationExceptions during DidYouMean collection	13 years ago
Michael Peter Christen	c6558cba08	more classification bugs	13 years ago
Michael Peter Christen	082831b9d6	search contentdom was checked in wrong way - fixed	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	52d307c735	prevent that the snippet fectch process removes catchall entries	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
reger	b2175ea4ef	Add possibility to set custom Solr field names for the YaCy default Solr attributes. - Changing the format of YaCy's solr.key.list while maintainig backward compatibility Federated index config screens adjusted accordingly - modified the Solr update request to use a 3 min Solr autocommit intervall	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Peter Christen	e2f8f263e8	changed storage of search words: keep order	13 years ago
Michael Peter Christen	2ea585d616	fix for host navigator	13 years ago
Michael Peter Christen	41536eb4a2	performance hack	13 years ago
Michael Peter Christen	f91487fc50	added delete-button for host navigation	13 years ago

1 2 3 4 5 ...

285 Commits (b9d36e45e083c6ad235ca4b0f842aaab4407d6a6)