yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	24f4ca4d85	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
apfelmaennchen	116f429e35	fix for java.lang.RuntimeException: TableColumnIndex not available...	13 years ago
Michael Peter Christen	1533bfd63b	refactoring	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	13 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	13 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	13 years ago
Michael Peter Christen	4d29f59a27	removed warnings	13 years ago
Michael Peter Christen	8c099d2106	Merge remote-tracking branch 'origin/master' Conflicts: htroot/api/ymarks/import_ymark.java source/de/anomic/data/ymark/YMarkEntry.java source/de/anomic/data/ymark/YMarkTables.java	13 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	13 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	13 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	13 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	13 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	13 years ago
orbiter	2094df2e4e	- correct length computation for BStringObject (bugfix suggested by apfelmaennchen) - using ASCII for string conversion for Strings generated from Integer	13 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	13 years ago
Michael Peter Christen	06b0081fdc	fix for NPE during host navigation computation	13 years ago
orbiter	acb9f04e80	removed unused classes	13 years ago
Michael Peter Christen	755f5e76cf	removed strange assert statements and simplified code in metadata transformation	13 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	13 years ago
Michael Peter Christen	f9fc5cfaba	better check for bad urls in url transmission	13 years ago
Michael Peter Christen	40c0856489	refactoring	13 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	13 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	13 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	13 years ago
Michael Peter Christen	94a334f128	another fix to the Solr metadata reading process and to the shutdown process	13 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	13 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	13 years ago
Michael Peter Christen	dcc72799c4	better abstraction for result writers using controlled vocabularies and URIRefs	13 years ago
Michael Peter Christen	a12f693ec9	added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search.	13 years ago
sixcooler	f32aa9a49c	prevent merge of blobs that can't be handled in memory	13 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	13 years ago
Michael Peter Christen	e432bb9cd9	better calculation of possible saving in HeapReader index data structure	13 years ago
Michael Peter Christen	9549984c65	documentation/comments	13 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	13 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	13 years ago
Michael Peter Christen	f0a079ac9f	allow larger log entries	13 years ago
Michael Peter Christen	784a4abb18	enhancement in internal data organization which should generate less synchronizations in database access	13 years ago
Michael Peter Christen	f78ce93a80	collection of speed and memory saving hacks	13 years ago
orbiter	a196f24f60	prevent enqueueing of non-loggeable logging entries	13 years ago
orbiter	482afed07c	reduced logging overhead (a bit)	13 years ago
orbiter	e76159040b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
Michael Peter Christen	83da68c4c1	fixed a memory leak inside the logger which appeared if the log was writter faster that the logger is able to print this out to its out stream. A very large collection of unwritten log outputs had been seen during strong crawling. The new ArrayBlockingQueue is limited to prevent this case.	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	1addbc792c	use less memory for md5 cache	13 years ago
Michael Peter Christen	f32de94723	more logging	13 years ago
Michael Peter Christen	8efc1c1078	- fixed a memory leak (or bad usage) during parsing/snippet fetch - more logging for errors	13 years ago
Michael Peter Christen	b0c408788b	made class methods static where possible	13 years ago
Michael Peter Christen	5bd3c90907	- removed unnecessary semicolons - added default case for switch	13 years ago
Michael Peter Christen	132afaf687	removed unaccessible code	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	83701a1b4c	removed unused ImageReference package	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
Michael Peter Christen	1481037820	replaced non-generic array with collection	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	de3ef8ad73	removed unimportant warnings	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	61bb52d55c	- using http://purl.org/dc/terms/references to refer from an auto-annotated document to a 'pseudo-linked' document which has an url created with an object-prefix as defined in the vocabulary file	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	bef823c247	close the reader if finished	13 years ago
cominch	9cbfc1a1c0	augmentedProxy, which forwards every proxy request to a rewrite engine to customize existing webpages. originally implemented by Florian Richter. Conflicts: source/de/anomic/http/server/HTTPDProxyHandler.java	13 years ago
Michael Peter Christen	3b992e6b00	using utf8 String compression in Webstructure database	13 years ago
Michael Peter Christen	2280a7b276	- changed initialization order to prefer allocation of memory for table files first - bugfixes in memory amount calculation	13 years ago
Michael Peter Christen	0746308bc2	only the metadata tables shall be able to use the tail cache	13 years ago
Michael Peter Christen	7ec9bef0c3	fix for OOM	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	b8f56a9803	npe bugfix	13 years ago
Michael Peter Christen	ba10caf89a	lazy initialization of database tables	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	10c9c17d51	fixed handlemap spread factor and null iterator handling	13 years ago
Michael Peter Christen	b0095c8d3c	flush the compressor cache when a cleanup is done	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	3dd8376825	added automatic cleaning of cache if metadata and file database size is not equal. It might happen that these data is different because one of that caches is cleaned after a while or when it is too big. The metadata is then not cleaned, but now wiped after a checkup process at every application start. This should cause a bit less memory usage.	13 years ago
Michael Peter Christen	6bb07afcc3	accept also files with other file prefix; used to read 'foreign' cache files	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	1f48d1528b	performance hacks	13 years ago
Michael Peter Christen	10da7335ea	performance hack: use a hash cache for all hashes that are computed by a byte array. If this hash is used in a HashMap (which is very often the case) then this hack eliminates a lot of re-computations of the same hash.	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	8d997d55b6	better logging	13 years ago
Michael Peter Christen	43c2c6e588	better logging	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	13 years ago
Michael Peter Christen	c6a09eab0b	synchronization needed	13 years ago
reger	6696cb1313	bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset. Changed the index init to insert lowercase peer names as key	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago

1 2 3 4 5 ...

662 Commits (082e3274d66bb3f3ec256864bcee168da2a11fa6)