yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	0504b01bdc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	9413f77b65	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	a55e77a115	added twitter search heuristic	12 years ago
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	12 years ago
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	12 years ago
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	12 years ago
orbiter	d73fff0e0e	added solr field images_withalt_i	12 years ago
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	12 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	12 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	12 years ago
Michael Peter Christen	2ddc33646a	added new field for solr: url_paths_sxt url_parameter_i url_parameter_key_sxt url_parameter_value_sxt url_chars_i	12 years ago
Michael Peter Christen	75d5e3475d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
cominch	dc468dad01	add content control features for custom filter lists	12 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	12 years ago
Michael Peter Christen	4c79ddb91e	switched off some solr logging	12 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	12 years ago
Michael Peter Christen	af764c106c	re-activated audio and video search because they obviously work (!)	12 years ago
orbiter	716ea0cfe2	sorted the solr schema into mandatory and optional fields; reduced number of used field to reduce solr index size	12 years ago
orbiter	db6863db77	reduced solr cache sizes to check if that solves memory problems a bit	12 years ago
Michael Peter Christen	23226676c6	FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default.	12 years ago
Michael Peter Christen	a1b2c9a67d	doctype2mime fix, influences metadata conversion between old metadata and solr	12 years ago
Michael Peter Christen	703f427303	fixed some peer-ping connection details - larger time-out - removed too old seedlist - fixed a bug in connection test	12 years ago
Michael Peter Christen	ea49a8aa8c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	aab0b680c3	- added xslt support for solr result formats. try i.e. http://localhost:8090/solr/select?q=:&start=0&rows=10&wt=xslt&tr=json.xsl - added servlet-side mime-type configuration for streamed servlets. this is used for the result formatters in solr result formats	12 years ago
cominch	e2119f4e76	augmented browsing: replace htmlparser by jsoup, which is more stable and reliable	12 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
Michael Peter Christen	bca4a16603	replaced the multivalue generic string field name suffix _ss by _txt because _ss is not part of the standard solr example schema.	12 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	12 years ago
Michael Peter Christen	3ce04cecf3	bad hack to prevent a bug appearing in solr	12 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	12 years ago
Michael Peter Christen	1517a3b7b9	added webm mime-type	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	4de50fe808	adding more principal peers for bootstraping	13 years ago
reger	067728bccc	add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)	13 years ago
Michael Peter Christen	508a81b86c	added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field.	13 years ago
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
Michael Peter Christen	c03d306afa	shorter autocommit time (now: 1 second) to prevent that user cannot see results in solr the first time they try it out. The value can now be easily set to a higher number using the IndexFederated_p interface.	13 years ago
Michael Peter Christen	3fd4a01286	added option to record urls that are forwarded to the solr index	13 years ago
Michael Peter Christen	8dd469b9dd	added option to configure the autocommit delay time of solr on-the-fly	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	1be0025a9c	- added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr	13 years ago
Michael Peter Christen	dbdd697f4d	moved RDFaParser.xsl configuration file to defaults	13 years ago
Michael Peter Christen	8738336408	set Xms lower than Xmx	13 years ago
Michael Peter Christen	96f6a5869f	more robust OAI-PMH client (large time-out, three re-tries). OAI-PMH server appeart to be very slow sometimes	13 years ago
Michael Peter Christen	6d17686258	made triplestore persistent by default added a size display in triplestore servlet	13 years ago
cominch	3c255c025b	Show tags in search results (if activated in ConfigPortal_p.html)	13 years ago
Michael Peter Christen	a5cdfb91de	- fixed Cache link (below snippet) - added 'Augmented Proxy' link below snippet - added configuration options for augmented proxy	13 years ago
Roland 'Quix0r' Haeder	af5a597e47	Scroogle is not comming back, remove dead code Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
cominch	90512640bf	Added config switches for custom parser Conflicts: source/net/yacy/document/TextParser.java	13 years ago
cominch	5d20cd324a	Add Triplestore and RDF query interface Conflicts: build.xml defaults/yacy.init source/net/yacy/interaction/AugmentHtmlStream.java	13 years ago
cominch	a32943b382	add json mimetype	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	8002fd2578	use less cache space since a large cache would cause more memory usage in index files.	13 years ago
Michael Peter Christen	5aee19daa4	added show from cache in search results (not yet finished)	13 years ago
Michael Peter Christen	0d32a766ed	relax verify attribute for search widget to make it faster: set to "cacheonly"	13 years ago
Michael Peter Christen	7eece0256f	moved yacy.logging to defaults according to request in http://bugs.yacy.net/view.php?id=55	13 years ago
Michael Peter Christen	db9d81cb7a	ups	13 years ago
Michael Peter Christen	e7e381d110	added configuration to switch off redirection following in crawler	13 years ago
Michael Peter Christen	2be327b5ab	update location update	13 years ago
Michael Peter Christen	99c74699de	removed scroogle (scroogle is dead)	13 years ago
Michael Peter Christen	8bee1472c9	there is no noindex, only nofollow in links	13 years ago
Michael Peter Christen	4c5edab1ec	added option to have exception search result windows	13 years ago
Michael Peter Christen	696ee5fc16	removed pdf from default parser deny list	13 years ago
Lotus	c73af39e54	refactoring of tray icon class, now uses Java 6 methods natively	13 years ago
Michael Peter Christen	987b412491	updated solr scheme: generic declaration of solr schemes	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
Michael Christen	17f962fceb	translator updates: - config string for chinese - do not copy the language file to DATA/LOCALE any more (and do not use them there, this is really confusing for new translators)	13 years ago
Michael Christen	c715d19c09	fixes for dependency on svn	13 years ago
Michael Christen	f62e6fb438	less frequent DHT distribution to reduce the load a bit on every peer	13 years ago
Michael Christen	9dbc93613e	now that the whole world knows that we actually do p2p and not metasearch we can support a default look-up to scroogle to gain more attention to people who say that your search results are incomplete	13 years ago
orbiter	f9216e388c	- faster ping to clean up old peers faster - clean up more news git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ac5bda205f	- removed lower page navigation (it never looks nice) - added visibility of metadata and parser in search results since that shows what YaCy can do in a nice way git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8091 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c659310e89	- removed option to search for audio, video and applications. These things are still experimental and should not be shown to new users since this would cause them to argue that YaCy does not work. The functions are stil available, because: - added a configuration option in ConfigPortal to swtich the search media types on or off git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8090 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6cd27473f5	- better default values for caching and cache usage - set new caching and verification behavior according to use case automatically git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5866c73a09	fix for compare search: use scroogle instead of bing and get a default search if configured search engine is not available git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8074 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e4a82ddd8b	produce a bookmark entry from every crawl start. these bookmarks are always private. these bookmarks will be used to get a source reference for the search in case of intranet or portal searches. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	f183d3822c	added a default accept header in http requests since some http fraud detection functions check that this header field exist see also: http://bad-behavior.ioerror.us/ in source file browser.inc.php git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8048 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	78ce3b13be	typo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
suessthomas	887f088dad	The IP address of the YaCy-Demo portal added to Whitelist. This is only a temporary workaround. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8013 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1b45e33f04	added robots tag parser to solr scheme git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	cf4fd525ee	added directDocByURL attribute in crawl profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5ad7f9612b	added crawl settings for three new filters for each crawl: must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue) must-not-match for IPs must-match against a list of country codes (allows only loading from hosts that are hostet in given countries) note: the settings and input environment is there with that commit, but the values are not yet evaluated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2c3161b4ac	refactoring: RankingProcess -> RWIProcess ResultFetcher -> SnippetProcess git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6b22865dbc	- removed some warinings - removed a dead update location git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e48ce5d80e	- style change for search box: larger font, selected by default - style change for search results: by default no parser, size, image info git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7949 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	ecb4986b38	refactored stuff from last commit to ReferenceContainer see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163 the limiting of references is disabled per default to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	49e5ca579f	added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	9a8937f8b6	be more liberal when evaluating search results. This may cause that it is possible to fraud content on fresh peers, but that is better than looong waiting times for the evaluation of every link which causes that everybody rejects YaCy as 'too slow'. But this is only because of the high standards that YaCy sets to itself. If we are able to gain more users by lowering the standard, then that is useful. The option to set that flag to verify each link is still there. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7918 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1c007188ad	bugfixes in html parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5dd2efc9a2	- bugfixes in html parser - new fields in solr - extended file viewer to debug parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	4fec99115b	Implementation of strategies for controlling memory resources. You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html. The generation memory strategy is implemented with the objective of running more robust but with the cost of early stopping some tasks (eg. dht) while running low on memory. This new strategy does respect the generational way a heap is organized on most used jvms. These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail. Please be carefull using generation memory strategy and report errors by naming OS, jvm and java_args. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	77a9af99f1	same values for Xmx and Xms: memory extension may be difficult if the OS has not the remaining memory available and may kill the jvm. If the memory is reserved at the start but never used the OS may handle that as well and leave non-used space in swap area (and never swap) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7867 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	768c59740c	- replaced solrj 3.1 with solrj 3.3 - updated also slf4j - added authentication for solrj git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e7c7598923	docfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7828 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b84089ff04	fix for solr scheme list definition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7826 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2d4bb139d3	- added counting of links with noindex tag for solr index - bugfixes for solr index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	fa6f2c2b44	use proxy accounts by default for more security http://bugs.yacy.net/view.php?id=45 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7815 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2 3 4 5 ...

349 Commits (81380ae5c8e3abd1b34f9f4a0a016b3696d9c393)