yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	29171e2f6c	fixed generation of ontologies from index enumerations	12 years ago
orbiter	01a63ef595	redesign of YaCySchema and SolrDoc handling	12 years ago
orbiter	479bfca571	refctoring	12 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	12 years ago
Michael Peter Christen	ab6ec4ec52	added snippet computation to solr/rss and gsa result writer	12 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	12 years ago
Michael Peter Christen	653645c1cf	corrected solr query syntax	12 years ago
orbiter	716ea0cfe2	sorted the solr schema into mandatory and optional fields; reduced number of used field to reduce solr index size	12 years ago
orbiter	9b8c8c0f47	fix from gaston in http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909	12 years ago
Michael Peter Christen	a049761e0c	fixed double-check	12 years ago
Michael Peter Christen	f42a57cd7d	gsa format update	12 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	12 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	12 years ago
Michael Peter Christen	f00733186b	code simplifications	12 years ago
Michael Peter Christen	db0d438709	fix for http://bugs.yacy.net/view.php?id=206	12 years ago
orbiter	404b0aab09	refactoring in remote search and stub for remote node peer selection	12 years ago
orbiter	d7ea45f698	- get nice text_t values from metadata conversions that are stored into solr as fulltext search index. - added slow migration from old metadata to solr index entries: each entry from the old metadata is removed from that data structure and written into solr.	12 years ago
orbiter	99ef57f103	reduced sleep times	12 years ago
orbiter	780f8974e7	added ramaining iteration methods for solr in fulltext class	12 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	12 years ago
orbiter	cc47a0876e	reverted `bf55f69176` to have a fall-back option in case that memory problems as reported in http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901 for full-solr installation are too strong and we have to work with an 'small memory footprint' peer system.	12 years ago
Michael Peter Christen	0904afe8fb	added concurrent iterator methods to the solr connectors	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	bf55f69176	removed write methods to old metadata file type; all metadata now goes to solr	12 years ago
Michael Peter Christen	40c0856489	refactoring	12 years ago
Michael Peter Christen	06a78eecb7	code simplification	12 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	12 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	12 years ago
Michael Peter Christen	23226676c6	FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default.	12 years ago
Michael Peter Christen	d988ba50cf	added a very rudimentary, incomplete, non-verified GSA response writer for solr. Try this: http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10	12 years ago
Michael Peter Christen	aab0b680c3	- added xslt support for solr result formats. try i.e. http://localhost:8090/solr/select?q=:&start=0&rows=10&wt=xslt&tr=json.xsl - added servlet-side mime-type configuration for streamed servlets. this is used for the result formatters in solr result formats	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	12 years ago
Michael Peter Christen	da851c6071	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	bd4f03bc85	removed unused class	12 years ago
orbiter	39f8eb60c3	tried to prevent calls to bad-hack getSize() method and reduced overhead of that method a bit.	12 years ago
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	12 years ago
orbiter	2571e0d47a	removed unused classes	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
Michael Peter Christen	136fcb1ad9	refactoring	12 years ago
Michael Peter Christen	a12f693ec9	added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search.	12 years ago
Michael Peter Christen	bca4a16603	replaced the multivalue generic string field name suffix _ss by _txt because _ss is not part of the standard solr example schema.	12 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	12 years ago
Michael Peter Christen	3ce04cecf3	bad hack to prevent a bug appearing in solr	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	ef488a15f7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
sixcooler	76b037a20a	check content domain fix: search image/media should not show pages containing image/media search text should show all/text but image/media	12 years ago
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	12 years ago
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	12 years ago
Michael Peter Christen	315d83cfa0	cleanup	12 years ago
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	12 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	05a3ffd03a	patches to ensure that solr connectors are active ony if they have a solr object assigned and vice versa	13 years ago
orbiter	5a3c829872	embedded solr is only initiated if it is activated with IndexFederated_p.html	13 years ago
Michael Peter Christen	97b7bcf2a6	added a solr search index - by default, a (empty) solr storage instance is created at SEGMENTS/solr_36 - the index is written if in /IndexFederated_p.html the flag "embedded solr search index" is switched on - a standard solr query interface is available now with a new servlet at http://127.0.0.1:8090/solr/select To test this, do the following: - switch to webportal mode - switch on the feature as described - do a crawl. this fills the solr index. The normal YaCy search will NOT work now! - do a solr query, like: http://127.0.0.1:8090/solr/select?q=: http://127.0.0.1:8090/solr/select?q=text_t:Help play with different search fields as you can see in /IndexFederated_p.html You can use the standard solr query attributes as described in http://wiki.apache.org/solr/SearchHandler	13 years ago
orbiter	c00a3cf74d	less usage of generic logger to avoid logger generation overhead	13 years ago
orbiter	e76159040b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
Michael Peter Christen	58e7d1952f	reduction of logging to prevent too much IO caused be logging	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	c7afa8bc48	using SwitchboardConstants for solr attributes	13 years ago
orbiter	c6d8950651	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	d09d9f2364	filter old peers from bootstrap (now stronger: 60 minutes instead of 240).	13 years ago
Michael Peter Christen	b0c408788b	made class methods static where possible	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
orbiter	fc0f9543fe	More SentenceReader cleanup	13 years ago
orbiter	d4291ac1f3	more tolerance when creating solar document	13 years ago
orbiter	78fc3cf8f8	refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class.	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	de903a53a0	parser refactoring & hacks	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	7249d9c9de	bugfix for concurrent seed loader	13 years ago
Michael Peter Christen	c72d3b12cd	concurrently initialize the seed list during p2p network bootstrap	13 years ago
Michael Peter Christen	1825f165b8	better integration of blacklist according to use case	13 years ago
Michael Peter Christen	c18fa9fa75	Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
reger	067728bccc	add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)	13 years ago
Michael Peter Christen	03280fb161	removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index.	13 years ago
Michael Peter Christen	508a81b86c	added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field.	13 years ago
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
Michael Peter Christen	0294a53459	- add canonical field only if requested by solr schema - remove canonical url from in/outbound urls if present	13 years ago
Michael Peter Christen	3fd4a01286	added option to record urls that are forwarded to the solr index	13 years ago
Michael Peter Christen	96aeb127e3	generalized localhost naming. this is also a preparation for a better IPv6 implementation.	13 years ago
Michael Peter Christen	77f795756c	fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr.	13 years ago
Michael Peter Christen	8dd469b9dd	added option to configure the autocommit delay time of solr on-the-fly	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	fad3b14813	added jetty libraries, needed for future use as web server and as application server for the solr search interface	13 years ago
Michael Peter Christen	a38b0a2c46	extended embedded solr tests to ensure that it will be usable within a jetty instance	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	a5eb91fa60	refactoring	13 years ago
Michael Peter Christen	1be0025a9c	- added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr	13 years ago
Michael Peter Christen	e12bb254b4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	3f55dc7c1e	- added solr core and libraries that solr needs (lucene is missing, will follow later) - added embedded solr connector which can connect to solr programmatically (without using a server in between)	13 years ago
Michael Peter Christen	786be7d175	better integration of RDFaParser	13 years ago
Michael Peter Christen	0752983fbd	- automatic periodic saving of triplestore - transaction-safe storage of triplestore	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	64c0268b2b	show triplestore metadata in yacydoc and viewfile	13 years ago
cominch	a95127c9af	Triplestore: initalize per-user triplestores	13 years ago
Michael Peter Christen	e89747bb67	- added automated generation of vocabularies from url stubs - added clear of all terms for vocabularies - added deletion of vocabularies	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	5fc6524ca8	- moved triple store to net.yacy.cora.lod (should be generalized there later - added abstract add, delete, get methods in the triplestore - added generation of triples after auto-annotation - migrated all MultiProtocolURI objects to DigestURI in the parser since the url hash is needed as subject value in the triples in the triple store	13 years ago
Michael Peter Christen	4ee6fb1de9	added missing blacklist dht cache storage (maybe due to mistakes in cherry picking)	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
Roland 'Quix0r' Haeder	af5a597e47	Scroogle is not comming back, remove dead code Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
cominch	65c5826d93	bugfix Conflicts: source/net/yacy/document/parser/augment/AugmentParser.java	13 years ago
Michael Peter Christen	cde20911bb	saved a bit more ram using UTF8 String compression for OpenGeoDB and Geonames data files.	13 years ago
Michael Peter Christen	2280a7b276	- changed initialization order to prefer allocation of memory for table files first - bugfixes in memory amount calculation	13 years ago
Michael Peter Christen	0746308bc2	only the metadata tables shall be able to use the tail cache	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	dd14b19c26	lazy initialization of block rank table ... only normal web search uses this. When interactive search or location search is used, the block rank is switched off	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	ab7107b34b	fixed RWIProcess queue limits: now discovering hidden results for mass result retrieval	13 years ago
Michael Peter Christen	b0095c8d3c	flush the compressor cache when a cleanup is done	13 years ago
Michael Peter Christen	a61f44f9e4	lazy initialization of block rank table. this causes that the table is not initialized when there is no search is done. the effect is most strong if YaCy is started headless which causes no browser pop-up which otherwise would load the search page and therefore trigger the initialization of the table.	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	d0ec8018f5	fixes for bad long computation	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	2fe207f813	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	5e562dcdb7	adopted vocabulary usage within anotation/naviagtion feature of search to new SimpleVocabulary class	13 years ago
Michael Peter Christen	240045cf7c	fix for bad distance computation	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	834dc6b263	store more data from interface access	13 years ago
Michael Peter Christen	10da7335ea	performance hack: use a hash cache for all hashes that are computed by a byte array. If this hash is used in a HashMap (which is very often the case) then this hack eliminates a lot of re-computations of the same hash.	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	c846e9ca14	redesign of the crawler monitor page: show crawled pages instead of queue of urls that shall be crawled	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	13 years ago
Michael Peter Christen	7bf421b9dd	- fixed image search page navigation - removed some deadlocks and ConcurrentModificationExceptions during DidYouMean collection	13 years ago
Michael Peter Christen	fb94b47b1a	changed queue sizes to have less memory occupied during indexing	13 years ago
Michael Peter Christen	76157dc2c3	bugfix for http://bugs.yacy.net/view.php?id=173	13 years ago
Michael Peter Christen	c6558cba08	more classification bugs	13 years ago
Michael Peter Christen	082831b9d6	search contentdom was checked in wrong way - fixed	13 years ago
reger	ee553d971e	correct typo in scripts_txt comment	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	acf8d521a2	fix for http://bugs.yacy.net/view.php?id=126	13 years ago
Michael Peter Christen	bb88878b4d	the last commit was incomplete..	13 years ago
Michael Peter Christen	d320a31ae1	bugfix for http://bugs.yacy.net/view.php?id=186	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Roland 'Quix0r' Haeder	d10627d591	More sync in close() methods Conflicts: source/net/yacy/kelondro/logging/GuiHandler.java source/net/yacy/kelondro/workflow/InstantBusyThread.java	13 years ago
Roland 'Quix0r' Haeder	b3ae2aa41f	With or without 'final'? At least please try it in other methods Conflicts: source/de/anomic/tools/tarTools.java	13 years ago
Roland 'Quix0r' Haeder	fbb946f913	Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile	13 years ago
Michael Peter Christen	52d307c735	prevent that the snippet fectch process removes catchall entries	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
Michael Peter Christen	5deebd02ea	added serialization	13 years ago
reger	b2175ea4ef	Add possibility to set custom Solr field names for the YaCy default Solr attributes. - Changing the format of YaCy's solr.key.list while maintainig backward compatibility Federated index config screens adjusted accordingly - modified the Solr update request to use a 3 min Solr autocommit intervall	13 years ago
Michael Peter Christen	e7e381d110	added configuration to switch off redirection following in crawler	13 years ago
Michael Peter Christen	2717c1b749	fixed bug in solr interface	13 years ago
Michael Peter Christen	f150bc218b	fixed bug in solr error document	13 years ago
Michael Peter Christen	cb54c1737b	solrj connector bugfix	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	0d58fea210	made multiple connector default	13 years ago
Michael Peter Christen	adeb33bb36	better abstraction for solr objects	13 years ago
Michael Peter Christen	8864141872	more abstraction in solr connection classes	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	ea2bd43b28	patch for broken configurations	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	5f5ed33ed8	patch for media search (audio, video apps)	13 years ago
Michael Peter Christen	19efbf1b0f	- apply directDocByURL to NOLOAD Queue - choose pushing to NOLOAD as default for site crawl	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Christen	02e4dedff2	fix to url citation collection	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Christen	8fc86fe397	added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets.	13 years ago
Lotus	0b3f39136e	allow custom ppm lower than minimum button on /Crawler_p.html fixes http://bugs.yacy.net/view.php?id=166	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago
Michael Peter Christen	36e4d82b27	changed ranking	13 years ago
Michael Peter Christen	096c17e7cd	added test code	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	e2f8f263e8	changed storage of search words: keep order	13 years ago
Michael Peter Christen	2e5cd6a1b2	fixed parser extension deny list generation and usage	13 years ago
Michael Peter Christen	3cd6dcd352	do not add new solr fields as activated fields	13 years ago
Michael Peter Christen	e3bb73c3d6	serialized some database access methods	13 years ago
Michael Peter Christen	355ecf330f	reduced target file site to 64mb	13 years ago
Michael Peter Christen	2ea585d616	fix for host navigator	13 years ago
Michael Peter Christen	4c5edab1ec	added option to have exception search result windows	13 years ago
Michael Peter Christen	ef78f22ee1	performance hack	13 years ago
Michael Peter Christen	41536eb4a2	performance hack	13 years ago
Michael Peter Christen	f91487fc50	added delete-button for host navigation	13 years ago
Michael Peter Christen	e8d24fd802	author navigator can be switched off	13 years ago
Michael Peter Christen	558ab7bd4e	made the protocol navigator reversible	13 years ago
Michael Peter Christen	96cb75f1d4	made the filetype navigator be able to deselect the search constraint	13 years ago
Lotus	c73af39e54	refactoring of tray icon class, now uses Java 6 methods natively	13 years ago
Michael Peter Christen	4eff0e26f1	npe bugfix	13 years ago
Michael Peter Christen	1a0b6b3913	get more navigation details to search results	13 years ago
Michael Peter Christen	83009d86f7	added the vocabulary navigator. It can be very simply tested by switching on the locale dictionaries.	13 years ago
Michael Peter Christen	254adea51c	small fixes	13 years ago
Michael Peter Christen	c602eaaf46	enhanced search process	13 years ago
Michael Christen	eff966f396	fix for search process (it was aborted too early during remote search)	13 years ago
Marek Otahal	72adbeae90	!Important: move from Hashtable to HashMap Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue, but I found notices that some (ugly big) helper classes had to be created in past to compensate missing Hashtable's functionality. I'd like input if we can remove some of them. look for //FIX: if these commits Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	f40efb39af	Blacklist loadList() remove duplicates by using Set Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Peter Christen	2ee8cbeb2c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
Michael Peter Christen	992dbdf4bb	added noload statistic to servlets	13 years ago
Michael Christen	216a287a85	Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	13 years ago
stbrumm	d18095dc48	Patch fuer Issue 0000102 and fixes to Patch (private peer status is a property of a peer, not a status)	13 years ago
Michael Christen	585a8f3c44	fixed a bug in search sequence (caused emtpy results)	13 years ago
Roland 'Quix0r' Haeder	a3083d13bf	Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user	13 years ago
Michael Christen	52184a1170	fix for search process	13 years ago
Michael Christen	0797b0de99	new handling of remote search processes: looking for seeds will now not block the whole search process any more. A deadlock with a DHT selection process may have been the cause for interface lockings in the past.	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	e9dc99fe15	added rules to set specific RWIs as private RWIs which are not transmitted to remote peers. This will be used for private index copies and phonetic indexes.	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
Michael Christen	3eccdca63c	protection against too long running snippet fetch processes	13 years ago
Michael Christen	86b3385847	fixed a deadlock during secondary remote search	13 years ago
Michael Christen	c715d19c09	fixes for dependency on svn	13 years ago
Michael Christen	0bc5d76bee	ups	13 years ago
Michael Christen	044f83feed	added some pauses into the search process which shall produce better-ranked search results. without that pauses the result page will only contain links from the peer that answers first which is not a good average picture of all the peers that provided results	13 years ago
Michael Christen	f14faf503b	better ranking because we wait a very little time during the search process more to get better remote sear results into the ranking priority stack	13 years ago
orbiter	f9216e388c	- faster ping to clean up old peers faster - clean up more news git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d9c066227a	fix for npe git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8122 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ebd840ebf6	- enhanced description on search front page - fixed language and heuristic modifier - added hint to crawl start that we can do also ftp and smb crawls - added a protocol extension to remote crawls to transport all search modifiers to remote peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e22f8497c9	- tested the ARC methods - removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method) - increased speed of CrawlResults page (no dns lookup any more) - increased speed of favicon display (removed dns lookup) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	bc5df0eef5	updated ranking tables (fresh computation) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8103 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c9216d5adf	fixed secondary remote search (the process that finds distributed join situations) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8098 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	64fd20b857	new default ranking profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8097 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0cf9ebc3b0	speed enhancements when parsing RWI rows (makes search slightly faster) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8096 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ee8b1d4de1	fixed unresolved pattern and unwanted local/global switch when using votes on search results git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8093 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c584db991f	creating a bookmark from the search results now works again .. with new YMarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	6cd27473f5	- better default values for caching and cache usage - set new caching and verification behavior according to use case automatically git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1019c36dad	bug fixes and speed enhancements for search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8085 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	507c9d478d	much better timing when search globally; less blocking; more results earlier! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8084 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	8e0b2c5832	fixed cluster search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	804e48888b	smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter should also be a little bit faster git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	84c3fc9d97	local/global fixes in search, better abstraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8054 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	06352b8d6b	more logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8047 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	017a01714d	- enhanced logging in robots.txt parser for remote debugging - robots.txt is now more robust against database operations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	3a15e58e28	- increased stability when opening the robots table - increased stability when deleting tables git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8034 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	78ce3b13be	typo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	85d6bf4ac4	fixed urls to media content during indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	0d858d48ec	replaced String with StringBuilder in suggestion process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	3a807e10cf	- added a cache for active crawl profiles to the crawl switchboard - moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e58438c01c	- added a new retry connector for solr (for cases where solr responses are slow) - added a new exist property into the metadataRepository which includes solr entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	4ad9fc2bff	new snippet strategy for search hits in metadata: show beginning of text instead of hit position git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5af9598bd1	enhanced exported row parsing during row import this affects the search and dht receive speed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7994 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	a7df70221e	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	cf4fd525ee	added directDocByURL attribute in crawl profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	035ebfbf3b	- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill) - this may have also (good) performance side effects on other parts of YaCy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b250e6466d	implemented crawl restrictions for IP pattern and country lists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2c3161b4ac	refactoring: RankingProcess -> RWIProcess ResultFetcher -> SnippetProcess git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago

... 4 5 6 7 8 ...

514 Commits (24bcf54100ddf78f9a390130dc2414df6d80ba1c)