yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	12 years ago
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	12 years ago
Michael Peter Christen	4b5e0c1500	added an url rewriter which can be used to remove session ids from urls	12 years ago
Michael Peter Christen	877042a6b5	fix for portal mode	12 years ago
Michael Peter Christen	76d218fbef	fixes to crawl profiles	12 years ago
Michael Peter Christen	2f536cb54d	code cleanup: removed unised methods and made more methods and objects private	12 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	12 years ago
Michael Peter Christen	6ab64746d7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	12 years ago
sof	5cb244b79b	Merge remote branch 'origin/master'	12 years ago
apfelmaennchen	88b062210c	Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based on the jaudiotagger library. The parser is disabled by default as it needs to store temporary files for non file:// protocols, which might be disliked. For your local MP3-collection it loads nicely Artist, Title, Album etc. from the audio files meta data.	12 years ago
Michael Peter Christen	28bd3e62b1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	4fed4a86d8	another fix to location search	12 years ago
orbiter	0f7a54452d	fix for location search query encoding	12 years ago
Michael Peter Christen	31485a963d	refactoring	12 years ago
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	12 years ago
Michael Peter Christen	3d33a5bdf6	turned the synonyms_t Text field into a multi-valued String field synonyms_sxt	12 years ago
Michael Peter Christen	41ab2a2279	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	c8b1a693dc	ups, added missing class for last commit	12 years ago
Michael Peter Christen	3b959ee002	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	3190347814	added a synonyms_t field to solr and a process to read synonym files. This can be used to add another stemming to solr using stemming files that are expressed as synonyms for grammatical alternatives. The synonym/stemming files must have the following form: - each line is a comma-separated list of synonyms - the list of synonyms may be enclosed with {} (like the GSA synonyms file) - the file may contain comments which are lines starting with a '#' The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and are activated by default whenever a synonym file is in place. Then, for each word that is found in a document all synonyms are added to a long text field which is stored into synonyms_t. Processes using the synonyms must query with that field as optional matcher.	12 years ago
Michael Peter Christen	411d0e839b	added an underline text field to solr to record all underlined texts	12 years ago
Michael Peter Christen	c4a3d8870f	fixed computation of links in host browser which are not indexed but knwon by the crawler. Such links are now displayed in grey color.	12 years ago
Michael Peter Christen	f45f7fc12e	added new Host Browser to main menu: this new search interface is something completely new for search, but completely common on desktops: browser a web space like one would browse a file system in a file browser. The file listing is created using the search index and a faceted restriction to specific domains.	12 years ago
Michael Peter Christen	8556a3d521	extended solr connector with a method to retrieve a single facet.	12 years ago
Michael Peter Christen	816cb6ce93	another fix for the debian installer: the installer fails because some classes had unresolved dependencies. This fix removes the dependencies.	12 years ago
Michael Peter Christen	280e36c90b	allow Cross-Origin Resource Sharing for all stream servlets, that is the solr and the gsa search interface. That means that all JavaScript in browsers now can Cross-Origin access all YaCy search interfaces, which opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over' concepts.	12 years ago
Michael Peter Christen	016ffa7434	increased strength of crawling waves in network image	12 years ago
Michael Peter Christen	23f68f2a69	force usage of default faceting mechanisms for search	12 years ago
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	12 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	12 years ago
Michael Peter Christen	a4214694df	We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index.	12 years ago
Michael Peter Christen	0cec7e761a	enhanced snippet extractor to find snippets also inside of tokens of an url	12 years ago
sixcooler	6c50d016ed	pdf- and zipParser should not use forced Memory-Limits	12 years ago
Michael Peter Christen	562183932b	- removed ip_s from default profile since that needs a DNS lookup to create an document entry. This makes remote search much slower. - removed synchronization of add method if ip_s is activated to prevent that a user configuration causes bad behavior. The disadvantage of that is, that a index dump can cause data loss if an indexing is running during index dump - catched more exceptions and more NPE - better abstraction in MirrorSolrConnector - slight performance enhancement when only the index count is requested (rows=0 is sufficient to get a total count)	12 years ago
Michael Peter Christen	24f4ca4d85	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
apfelmaennchen	116f429e35	fix for java.lang.RuntimeException: TableColumnIndex not available...	12 years ago
Michael Peter Christen	5ac61591f3	better abstraction for solr query params	12 years ago
Michael Peter Christen	c913b2ba77	- fix for NPEs during remote solr configuration - fixed remote solr setting switch - added more logging	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	12 years ago
Michael Peter Christen	872f83ebe0	refactoring	12 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	12 years ago
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	12 years ago
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	12 years ago
Michael Peter Christen	4a3e684f8c	added a directory-to-zip writer and zip-to-directory reader	12 years ago
Michael Peter Christen	d9ebf4a40f	a bit more logging	12 years ago
Michael Peter Christen	5683162bd3	simplifications in DHT Distribution class and more documentation	12 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	12 years ago
orbiter	a053b356ee	added new classes to renovate the YaCy protocol based on simple data structures in cora: - added the Peer object, which is a fresh version of Seed - added the Peers object, which is a fresh version of Network - added the Network api access class to retrieve a list of peers based on the Network.xml servlet in all YaCy peers.	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	f879a344e7	fix for no depth limit default value	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
orbiter	aa65282259	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	12 years ago
orbiter	6e0f4557f8	added ftp to getName	12 years ago
cominch	23204d2245	change parameter to support the smw extension for list import	12 years ago
Michael Peter Christen	c235d5c0f1	fixed size parsing in RSS message parser (for YaCy size parameter)	12 years ago
Michael Peter Christen	5bc8f34150	fix for success query counter	12 years ago
orbiter	60b1e23f05	added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached.	12 years ago
orbiter	4987921d3d	fixed the size() method which counted also failed pages (which are also inside the solr index)	12 years ago
Michael Peter Christen	6ec02deec6	added new crawl attributes in crawl profile (not active yet)	12 years ago
Michael Peter Christen	a13e5153ac	- added the possibility to have not one but a list of crawl start urls - the list of urls is entered in the expert crawl start in a textfield; the one-line input field was replaced with a text box - start urls can also be given in one single line where the urls are separated by a '\|'-character - as an effect, the crawl profile cannot carry a single start url for identificaton because it is possible to have more. Therefore the url was removed from the crawl profile - this affect all servlets which display a crawl profile: removed the url field from all there servlets - to work consistently with several start urls and the other crawl starts which computed crawl start url lists from sitelists or sitemaps, the crawl start servlet was restructured completely - new rules for must-match patterns were created to make it possible that site crawl starts also work with several crawl starts at once	12 years ago
Michael Peter Christen	975bc95ddf	added default facet fields for json response format (stub)	12 years ago
Michael Peter Christen	a30653a864	added a regular expression test servlet which is linked within the parser/crawler error page whenever a problem with regular expression occurs. This makes it easy to correct and enhance the must-match and must-not-match patterns just by trying out which pattern could be correct.	12 years ago
Michael Peter Christen	0504b01bdc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	9413f77b65	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	a55e77a115	added twitter search heuristic	12 years ago
Michael Peter Christen	e54ac38095	- some corrections in usage of getFile() and getFileName() - added more attributes in json response writer according to yacy servlet	12 years ago
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	12 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	12 years ago
Michael Peter Christen	b846f585fa	fixed a bug with size_i field usage	12 years ago
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	12 years ago
orbiter	fcd5c7eec3	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	6171143b4a	added facet stub in JsonResponseWriter	12 years ago
Michael Peter Christen	e84ffdb4f3	enhanced solr writers	12 years ago
Michael Peter Christen	9644c186a4	added search functionality to ViewFile.html servlet	12 years ago
Michael Peter Christen	5df553c152	- added a json writer for solr (yes there was one using xslt but this one writes the same way as yacysearch.json) - using the new json solr result to change the ajax search in IndexControlURLs to the new solr search	12 years ago
Michael Peter Christen	4634f0e626	fix for images_withalt	12 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	12 years ago
Michael Peter Christen	1754fbb6d9	Merge remote-tracking branch 'reger/master'	12 years ago
Michael Peter Christen	4d29f59a27	removed warnings	12 years ago
Michael Peter Christen	8c099d2106	Merge remote-tracking branch 'origin/master' Conflicts: htroot/api/ymarks/import_ymark.java source/de/anomic/data/ymark/YMarkEntry.java source/de/anomic/data/ymark/YMarkTables.java	12 years ago
apfelmaennchen	59bd478ed1	Added more sophisticated RDF output for YMarks, including the folder structure (b:Topic) and support for multiple tags (dc:subject) and folders (b:hasTopic) via rdf:Bag container.	12 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	12 years ago
reger	40d8086bf7	keep input order of translation entries within one file section. Allowing on translation conflicts (translaton of words contained in other sentence) to put shorter key at the end of the translation list.	12 years ago
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	12 years ago
orbiter	d73fff0e0e	added solr field images_withalt_i	12 years ago
sixcooler	a975bcffcb	clear fulltext-cache and stop crawling if running out of memory	12 years ago
sixcooler	e78fe3f477	also do a clearcache on the solr-connector-caches	12 years ago
sixcooler	9ee2e09983	statistics for solr-cache	12 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	12 years ago
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	12 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	12 years ago
Michael Peter Christen	4815713ec7	added synchronization to solr server requests since lucene is not thread-safe. We experienced problems as described in http://stackoverflow.com/questions/5327978/lockobtainfailedexception-updating-lucene-search-index-using-solr	12 years ago
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	12 years ago
Michael Peter Christen	a427a68bac	removed many warnings	12 years ago
Michael Peter Christen	c72c435517	- moved the gsa search interface from /gsa/searchresult? to /gsa/search? - fixed the NB field data	12 years ago

1 2 3 4 5 ...

5956 Commits (3d3d654e883d14ccef4a261aae4ddec11e7ff33d)