yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	13 years ago
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	13 years ago
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	13 years ago
Michael Peter Christen	4a3e684f8c	added a directory-to-zip writer and zip-to-directory reader	13 years ago
Michael Peter Christen	d9ebf4a40f	a bit more logging	13 years ago
Michael Peter Christen	5683162bd3	simplifications in DHT Distribution class and more documentation	13 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	13 years ago
orbiter	a053b356ee	added new classes to renovate the YaCy protocol based on simple data structures in cora: - added the Peer object, which is a fresh version of Seed - added the Peers object, which is a fresh version of Network - added the Network api access class to retrieve a list of peers based on the Network.xml servlet in all YaCy peers.	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	f879a344e7	fix for no depth limit default value	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	13 years ago
orbiter	aa65282259	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	13 years ago
orbiter	6e0f4557f8	added ftp to getName	13 years ago
cominch	23204d2245	change parameter to support the smw extension for list import	13 years ago
Michael Peter Christen	c235d5c0f1	fixed size parsing in RSS message parser (for YaCy size parameter)	13 years ago
Michael Peter Christen	5bc8f34150	fix for success query counter	13 years ago
orbiter	60b1e23f05	added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached.	13 years ago
orbiter	4987921d3d	fixed the size() method which counted also failed pages (which are also inside the solr index)	13 years ago
Michael Peter Christen	6ec02deec6	added new crawl attributes in crawl profile (not active yet)	13 years ago
Michael Peter Christen	a13e5153ac	- added the possibility to have not one but a list of crawl start urls - the list of urls is entered in the expert crawl start in a textfield; the one-line input field was replaced with a text box - start urls can also be given in one single line where the urls are separated by a '\|'-character - as an effect, the crawl profile cannot carry a single start url for identificaton because it is possible to have more. Therefore the url was removed from the crawl profile - this affect all servlets which display a crawl profile: removed the url field from all there servlets - to work consistently with several start urls and the other crawl starts which computed crawl start url lists from sitelists or sitemaps, the crawl start servlet was restructured completely - new rules for must-match patterns were created to make it possible that site crawl starts also work with several crawl starts at once	13 years ago
Michael Peter Christen	975bc95ddf	added default facet fields for json response format (stub)	13 years ago
Michael Peter Christen	a30653a864	added a regular expression test servlet which is linked within the parser/crawler error page whenever a problem with regular expression occurs. This makes it easy to correct and enhance the must-match and must-not-match patterns just by trying out which pattern could be correct.	13 years ago
Michael Peter Christen	0504b01bdc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	9413f77b65	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	a55e77a115	added twitter search heuristic	13 years ago
Michael Peter Christen	e54ac38095	- some corrections in usage of getFile() and getFileName() - added more attributes in json response writer according to yacy servlet	13 years ago
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	13 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	13 years ago
Michael Peter Christen	b846f585fa	fixed a bug with size_i field usage	13 years ago
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	13 years ago
orbiter	fcd5c7eec3	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	6171143b4a	added facet stub in JsonResponseWriter	13 years ago
Michael Peter Christen	e84ffdb4f3	enhanced solr writers	13 years ago
Michael Peter Christen	9644c186a4	added search functionality to ViewFile.html servlet	13 years ago
Michael Peter Christen	5df553c152	- added a json writer for solr (yes there was one using xslt but this one writes the same way as yacysearch.json) - using the new json solr result to change the ajax search in IndexControlURLs to the new solr search	13 years ago
Michael Peter Christen	4634f0e626	fix for images_withalt	13 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	13 years ago
Michael Peter Christen	1754fbb6d9	Merge remote-tracking branch 'reger/master'	13 years ago
Michael Peter Christen	4d29f59a27	removed warnings	13 years ago
Michael Peter Christen	8c099d2106	Merge remote-tracking branch 'origin/master' Conflicts: htroot/api/ymarks/import_ymark.java source/de/anomic/data/ymark/YMarkEntry.java source/de/anomic/data/ymark/YMarkTables.java	13 years ago
apfelmaennchen	59bd478ed1	Added more sophisticated RDF output for YMarks, including the folder structure (b:Topic) and support for multiple tags (dc:subject) and folders (b:hasTopic) via rdf:Bag container.	13 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	13 years ago
reger	40d8086bf7	keep input order of translation entries within one file section. Allowing on translation conflicts (translaton of words contained in other sentence) to put shorter key at the end of the translation list.	13 years ago
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	13 years ago
orbiter	d73fff0e0e	added solr field images_withalt_i	13 years ago
sixcooler	a975bcffcb	clear fulltext-cache and stop crawling if running out of memory	13 years ago
sixcooler	e78fe3f477	also do a clearcache on the solr-connector-caches	13 years ago
sixcooler	9ee2e09983	statistics for solr-cache	13 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	13 years ago
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	13 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	13 years ago
Michael Peter Christen	4815713ec7	added synchronization to solr server requests since lucene is not thread-safe. We experienced problems as described in http://stackoverflow.com/questions/5327978/lockobtainfailedexception-updating-lucene-search-index-using-solr	13 years ago
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	13 years ago
Michael Peter Christen	a427a68bac	removed many warnings	13 years ago
Michael Peter Christen	c72c435517	- moved the gsa search interface from /gsa/searchresult? to /gsa/search? - fixed the NB field data	13 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	13 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	13 years ago
Michael Peter Christen	3142e675e8	fixed problems with GSA api: - better FS attribute - highlightning of searched words in title	13 years ago
Michael Peter Christen	3b19fe7b52	- fixed num parameter in GSA api - changed FS attribute in GSA api	13 years ago
Michael Peter Christen	2ddc33646a	added new field for solr: url_paths_sxt url_parameter_i url_parameter_key_sxt url_parameter_value_sxt url_chars_i	13 years ago
Michael Peter Christen	75d5e3475d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
cominch	a2841261bd	content control: apply filter if enabled to crawls	13 years ago
cominch	dc468dad01	add content control features for custom filter lists	13 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	13 years ago
orbiter	a3d5959981	Merge commit '65d49df865f60511d22d86fb15c33a082176e7ab'	13 years ago
Michael Peter Christen	4521d63c92	added boosts to solr search queries	13 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	13 years ago
Michael Peter Christen	f00168ecc5	added gsa result attribute 'has'	13 years ago
reger	65d49df865	security fix: clear automtic password only if adminAccountForLocalhost=false to prevent remote access to protected pages after restart. if adminAccountForLocalhost=true leave automatic password unchanged so access from local host is granted but remote access is preventet from the 1st second.	13 years ago
orbiter	2094df2e4e	- correct length computation for BStringObject (bugfix suggested by apfelmaennchen) - using ASCII for string conversion for Strings generated from Integer	13 years ago
orbiter	6d03433cda	- added hack to prevent that stream servlet paths are not parsed wrongly if the path contains a dot. - added also warnings if documents are requests which do not exist.	13 years ago
orbiter	67f2866cd0	small fixes	13 years ago
orbiter	ce156a01ba	Merge commit 'c2341a175fdd755a34965ff63c7ea437b380352d'	13 years ago
David Rubio	c2341a175f	Fixed a bug that prevented Yacy from indexing files with non ASCII filenames in FTP servers. Previously Yacy could read file listings in UTF-8, but couldn't send commands to the FTP server in UTF-8 (the second byte of every multi-byte character was ignored), which caused a lot of errors on the server side. Now it handles UTF-8 correctly.	13 years ago
orbiter	3ebc4264c5	fixed concurrent query	13 years ago
orbiter	29171e2f6c	fixed generation of ontologies from index enumerations	13 years ago
orbiter	7cd302de3e	omit xml parsing when using the embedded solr server	13 years ago
orbiter	787e1c6836	added the QueryResponse query(SolrParams params) method to the SolrServerConnector which is necessary to use facets in solr search.	13 years ago
orbiter	01a63ef595	redesign of YaCySchema and SolrDoc handling	13 years ago
orbiter	479bfca571	refctoring	13 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	13 years ago
Michael Peter Christen	ab6ec4ec52	added snippet computation to solr/rss and gsa result writer	13 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	13 years ago
Michael Peter Christen	06b0081fdc	fix for NPE during host navigation computation	13 years ago
Michael Peter Christen	feb99bc291	fixed GSA format	13 years ago
Michael Peter Christen	653645c1cf	corrected solr query syntax	13 years ago
Michael Peter Christen	08ae142a3d	- enhanced caching after search queries to solr - reduced caching after short memory	13 years ago
orbiter	716ea0cfe2	sorted the solr schema into mandatory and optional fields; reduced number of used field to reduce solr index size	13 years ago
orbiter	9b8c8c0f47	fix from gaston in http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909	13 years ago
orbiter	acb9f04e80	removed unused classes	13 years ago
Michael Peter Christen	0ad52ac4c3	gsa bugfix for date parser	13 years ago
Michael Peter Christen	3ce4c2f937	fixes for gsa result format	13 years ago
Michael Peter Christen	67d235fae9	added gzip encoding to solr2sor http interface, client side (server already works)	13 years ago
Michael Peter Christen	a049761e0c	fixed double-check	13 years ago
Michael Peter Christen	f42a57cd7d	gsa format update	13 years ago
Michael Peter Christen	b3aad6cc35	bugfix for remote search when search is done to solr	13 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	13 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	13 years ago
Michael Peter Christen	f00733186b	code simplifications	13 years ago
Michael Peter Christen	755f5e76cf	removed strange assert statements and simplified code in metadata transformation	13 years ago
Michael Peter Christen	db0d438709	fix for http://bugs.yacy.net/view.php?id=206	13 years ago
orbiter	404b0aab09	refactoring in remote search and stub for remote node peer selection	13 years ago
orbiter	d7ea45f698	- get nice text_t values from metadata conversions that are stored into solr as fulltext search index. - added slow migration from old metadata to solr index entries: each entry from the old metadata is removed from that data structure and written into solr.	13 years ago
orbiter	99ef57f103	reduced sleep times	13 years ago
orbiter	780f8974e7	added ramaining iteration methods for solr in fulltext class	13 years ago
orbiter	acd2dc3575	hack to removed StringBuilder overhead in query construction	13 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	13 years ago
orbiter	cc47a0876e	reverted `bf55f69176` to have a fall-back option in case that memory problems as reported in http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901 for full-solr installation are too strong and we have to work with an 'small memory footprint' peer system.	13 years ago
Michael Peter Christen	0904afe8fb	added concurrent iterator methods to the solr connectors	13 years ago
Michael Peter Christen	d54b80327a	refactoring	13 years ago
Michael Peter Christen	f9fc5cfaba	better check for bad urls in url transmission	13 years ago
Michael Peter Christen	d39463a85c	added deleteByQuery to solr connectors	13 years ago
Michael Peter Christen	0cab06c47c	refactoring	13 years ago
Michael Peter Christen	bf55f69176	removed write methods to old metadata file type; all metadata now goes to solr	13 years ago
Michael Peter Christen	40c0856489	refactoring	13 years ago
Michael Peter Christen	06a78eecb7	code simplification	13 years ago
Michael Peter Christen	54bea21c02	bugfix for solr connector, possibly a cause for http://forum.yacy-websuche.de/viewtopic.php?p=26893#p26893	13 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	13 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	13 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	13 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	13 years ago
Michael Peter Christen	efafa79db5	- added a content-encoding: gzip to streamed http server responses - finish and close streamed http responses immediately - this applies only to the solr interface which should be much faster now!	13 years ago
Michael Peter Christen	23226676c6	FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default.	13 years ago
Michael Peter Christen	a1b2c9a67d	doctype2mime fix, influences metadata conversion between old metadata and solr	13 years ago
Michael Peter Christen	a16206e38b	more attempts to clean the index (cleaning is faster then)	13 years ago
Michael Peter Christen	703f427303	fixed some peer-ping connection details - larger time-out - removed too old seedlist - fixed a bug in connection test	13 years ago
Michael Peter Christen	597bb76e4f	get the peer location more quickly	13 years ago
Michael Peter Christen	1641835fef	replaced yacy xml encoding by solr xml encoding	13 years ago
Michael Peter Christen	89fe13e73d	enhanced GSA and RSS output format: corrected date, added some missing fields, added xml encoding for utf8	13 years ago
Michael Peter Christen	ea49a8aa8c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	d988ba50cf	added a very rudimentary, incomplete, non-verified GSA response writer for solr. Try this: http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10	13 years ago
Michael Peter Christen	aab0b680c3	- added xslt support for solr result formats. try i.e. http://localhost:8090/solr/select?q=:&start=0&rows=10&wt=xslt&tr=json.xsl - added servlet-side mime-type configuration for streamed servlets. this is used for the result formatters in solr result formats	13 years ago
cominch	e2119f4e76	augmented browsing: replace htmlparser by jsoup, which is more stable and reliable	13 years ago
Michael Peter Christen	9448d9a8a2	ups	13 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	13 years ago
Michael Peter Christen	94a334f128	another fix to the Solr metadata reading process and to the shutdown process	13 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	13 years ago
Michael Peter Christen	da851c6071	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	bd4f03bc85	removed unused class	13 years ago
orbiter	39f8eb60c3	tried to prevent calls to bad-hack getSize() method and reduced overhead of that method a bit.	13 years ago
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	13 years ago
orbiter	2571e0d47a	removed unused classes	13 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	13 years ago
Michael Peter Christen	b2b480fff2	more abstraction of the YaCySchema -> Opensearch matching process	13 years ago
Michael Peter Christen	24462e9baa	set the title every time, it is possible that it has changed	13 years ago

1 2 3 4 5 ...

5966 Commits (ac9540dfb6713b46df543907337a125151716a4a)