yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	872f83ebe0	refactoring	12 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	12 years ago
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	12 years ago
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	12 years ago
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	12 years ago
Michael Peter Christen	4634f0e626	fix for images_withalt	12 years ago
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	12 years ago
orbiter	d73fff0e0e	added solr field images_withalt_i	12 years ago
sixcooler	e78fe3f477	also do a clearcache on the solr-connector-caches	12 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	12 years ago
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	12 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	12 years ago
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	12 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	12 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	12 years ago
Michael Peter Christen	2ddc33646a	added new field for solr: url_paths_sxt url_parameter_i url_parameter_key_sxt url_parameter_value_sxt url_chars_i	12 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	12 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	12 years ago
orbiter	29171e2f6c	fixed generation of ontologies from index enumerations	12 years ago
orbiter	01a63ef595	redesign of YaCySchema and SolrDoc handling	12 years ago
orbiter	479bfca571	refctoring	12 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	12 years ago
orbiter	716ea0cfe2	sorted the solr schema into mandatory and optional fields; reduced number of used field to reduce solr index size	12 years ago
orbiter	9b8c8c0f47	fix from gaston in http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909	12 years ago
orbiter	d7ea45f698	- get nice text_t values from metadata conversions that are stored into solr as fulltext search index. - added slow migration from old metadata to solr index entries: each entry from the old metadata is removed from that data structure and written into solr.	12 years ago
orbiter	780f8974e7	added ramaining iteration methods for solr in fulltext class	12 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	12 years ago
orbiter	cc47a0876e	reverted `bf55f69176` to have a fall-back option in case that memory problems as reported in http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901 for full-solr installation are too strong and we have to work with an 'small memory footprint' peer system.	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	bf55f69176	removed write methods to old metadata file type; all metadata now goes to solr	12 years ago
Michael Peter Christen	40c0856489	refactoring	12 years ago
Michael Peter Christen	06a78eecb7	code simplification	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	12 years ago
Michael Peter Christen	bd4f03bc85	removed unused class	12 years ago
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	12 years ago
orbiter	2571e0d47a	removed unused classes	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
Michael Peter Christen	136fcb1ad9	refactoring	12 years ago
Michael Peter Christen	bca4a16603	replaced the multivalue generic string field name suffix _ss by _txt because _ss is not part of the standard solr example schema.	12 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	12 years ago
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	12 years ago
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	12 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	5a3c829872	embedded solr is only initiated if it is activated with IndexFederated_p.html	13 years ago
Michael Peter Christen	97b7bcf2a6	added a solr search index - by default, a (empty) solr storage instance is created at SEGMENTS/solr_36 - the index is written if in /IndexFederated_p.html the flag "embedded solr search index" is switched on - a standard solr query interface is available now with a new servlet at http://127.0.0.1:8090/solr/select To test this, do the following: - switch to webportal mode - switch on the feature as described - do a crawl. this fills the solr index. The normal YaCy search will NOT work now! - do a solr query, like: http://127.0.0.1:8090/solr/select?q=: http://127.0.0.1:8090/solr/select?q=text_t:Help play with different search fields as you can see in /IndexFederated_p.html You can use the standard solr query attributes as described in http://wiki.apache.org/solr/SearchHandler	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
orbiter	d4291ac1f3	more tolerance when creating solar document	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	1825f165b8	better integration of blacklist according to use case	13 years ago
Michael Peter Christen	03280fb161	removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index.	13 years ago
Michael Peter Christen	508a81b86c	added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field.	13 years ago
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
Michael Peter Christen	0294a53459	- add canonical field only if requested by solr schema - remove canonical url from in/outbound urls if present	13 years ago
Michael Peter Christen	3fd4a01286	added option to record urls that are forwarded to the solr index	13 years ago
Michael Peter Christen	77f795756c	fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr.	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	786be7d175	better integration of RDFaParser	13 years ago
Michael Peter Christen	e89747bb67	- added automated generation of vocabularies from url stubs - added clear of all terms for vocabularies - added deletion of vocabularies	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	ab7107b34b	fixed RWIProcess queue limits: now discovering hidden results for mass result retrieval	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
reger	ee553d971e	correct typo in scripts_txt comment	13 years ago
Michael Peter Christen	acf8d521a2	fix for http://bugs.yacy.net/view.php?id=126	13 years ago
Roland 'Quix0r' Haeder	d10627d591	More sync in close() methods Conflicts: source/net/yacy/kelondro/logging/GuiHandler.java source/net/yacy/kelondro/workflow/InstantBusyThread.java	13 years ago
Roland 'Quix0r' Haeder	b3ae2aa41f	With or without 'final'? At least please try it in other methods Conflicts: source/de/anomic/tools/tarTools.java	13 years ago
Roland 'Quix0r' Haeder	fbb946f913	Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile	13 years ago
Michael Peter Christen	52d307c735	prevent that the snippet fectch process removes catchall entries	13 years ago
Michael Peter Christen	5deebd02ea	added serialization	13 years ago
reger	b2175ea4ef	Add possibility to set custom Solr field names for the YaCy default Solr attributes. - Changing the format of YaCy's solr.key.list while maintainig backward compatibility Federated index config screens adjusted accordingly - modified the Solr update request to use a 3 min Solr autocommit intervall	13 years ago
Michael Peter Christen	2717c1b749	fixed bug in solr interface	13 years ago
Michael Peter Christen	f150bc218b	fixed bug in solr error document	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	adeb33bb36	better abstraction for solr objects	13 years ago
Michael Peter Christen	8864141872	more abstraction in solr connection classes	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	5f5ed33ed8	patch for media search (audio, video apps)	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Christen	02e4dedff2	fix to url citation collection	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Christen	8fc86fe397	added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets.	13 years ago
Michael Peter Christen	096c17e7cd	added test code	13 years ago
Michael Peter Christen	e3bb73c3d6	serialized some database access methods	13 years ago
Michael Peter Christen	355ecf330f	reduced target file site to 64mb	13 years ago
Michael Christen	eff966f396	fix for search process (it was aborted too early during remote search)	13 years ago
Marek Otahal	f40efb39af	Blacklist loadList() remove duplicates by using Set Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Christen	0797b0de99	new handling of remote search processes: looking for seeds will now not block the whole search process any more. A deadlock with a DHT selection process may have been the cause for interface lockings in the past.	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	e9dc99fe15	added rules to set specific RWIs as private RWIs which are not transmitted to remote peers. This will be used for private index copies and phonetic indexes.	13 years ago
Michael Christen	f14faf503b	better ranking because we wait a very little time during the search process more to get better remote sear results into the ranking priority stack	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e58438c01c	- added a new retry connector for solr (for cases where solr responses are slow) - added a new exist property into the metadataRepository which includes solr entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	035ebfbf3b	- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill) - this may have also (good) performance side effects on other parts of YaCy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2c3161b4ac	refactoring: RankingProcess -> RWIProcess ResultFetcher -> SnippetProcess git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago

... 2 3 4 5 6

267 Commits (cb85b227253d43ee40bf14bb5a1036f6559a8b18)