yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	12 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	12 years ago
Michael Peter Christen	5d16c23a1f	specified more URIMetadata as URIMetadataNode	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	cc98496ff3	enhanced the HostBrowser: - showing also outbound links to other domains if there are any - the outbound links browser shows also the link structure image - showing even inbound links if the web structure graph has information about that - removed the left menu and made the HostBrowser a part of the top menu for search - moved the file search also to the top menu - added hover information in the HostBrowser to explain what the click means - because the HostBrowser also links to the Metadata viewer ViewFile, there should be a button to switch back to the HostBrowser: added that also.	12 years ago
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	12 years ago
Michael Peter Christen	1b02408936	use less cache	12 years ago
Michael Peter Christen	36c13ed15b	less solr prefetch	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	7e3e45fd04	added Open Graph Metadata default fields, see http://ogp.me/ns#	12 years ago
Michael Peter Christen	c3e5f667a7	added schema.org breadcrumb counter to parser and solr schema	12 years ago
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	12 years ago
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	12 years ago
Michael Peter Christen	4b5e0c1500	added an url rewriter which can be used to remove session ids from urls	12 years ago
Michael Peter Christen	76d218fbef	fixes to crawl profiles	12 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	12 years ago
sof	5cb244b79b	Merge remote branch 'origin/master'	12 years ago
apfelmaennchen	88b062210c	Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based on the jaudiotagger library. The parser is disabled by default as it needs to store temporary files for non file:// protocols, which might be disliked. For your local MP3-collection it loads nicely Artist, Title, Album etc. from the audio files meta data.	12 years ago
orbiter	4fed4a86d8	another fix to location search	12 years ago
orbiter	0f7a54452d	fix for location search query encoding	12 years ago
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	12 years ago
Michael Peter Christen	3d33a5bdf6	turned the synonyms_t Text field into a multi-valued String field synonyms_sxt	12 years ago
Michael Peter Christen	3b959ee002	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	3190347814	added a synonyms_t field to solr and a process to read synonym files. This can be used to add another stemming to solr using stemming files that are expressed as synonyms for grammatical alternatives. The synonym/stemming files must have the following form: - each line is a comma-separated list of synonyms - the list of synonyms may be enclosed with {} (like the GSA synonyms file) - the file may contain comments which are lines starting with a '#' The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and are activated by default whenever a synonym file is in place. Then, for each word that is found in a document all synonyms are added to a long text field which is stored into synonyms_t. Processes using the synonyms must query with that field as optional matcher.	12 years ago
Michael Peter Christen	411d0e839b	added an underline text field to solr to record all underlined texts	12 years ago
Michael Peter Christen	c4a3d8870f	fixed computation of links in host browser which are not indexed but knwon by the crawler. Such links are now displayed in grey color.	12 years ago
Michael Peter Christen	f45f7fc12e	added new Host Browser to main menu: this new search interface is something completely new for search, but completely common on desktops: browser a web space like one would browse a file system in a file browser. The file listing is created using the search index and a faceted restriction to specific domains.	12 years ago
Michael Peter Christen	8556a3d521	extended solr connector with a method to retrieve a single facet.	12 years ago
Michael Peter Christen	23f68f2a69	force usage of default faceting mechanisms for search	12 years ago
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	12 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	12 years ago
Michael Peter Christen	a4214694df	We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index.	12 years ago
Michael Peter Christen	0cec7e761a	enhanced snippet extractor to find snippets also inside of tokens of an url	12 years ago
Michael Peter Christen	562183932b	- removed ip_s from default profile since that needs a DNS lookup to create an document entry. This makes remote search much slower. - removed synchronization of add method if ip_s is activated to prevent that a user configuration causes bad behavior. The disadvantage of that is, that a index dump can cause data loss if an indexing is running during index dump - catched more exceptions and more NPE - better abstraction in MirrorSolrConnector - slight performance enhancement when only the index count is requested (rows=0 is sufficient to get a total count)	12 years ago
Michael Peter Christen	5ac61591f3	better abstraction for solr query params	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	12 years ago
Michael Peter Christen	872f83ebe0	refactoring	12 years ago
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	12 years ago
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	12 years ago
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	12 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	12 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	12 years ago
orbiter	60b1e23f05	added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached.	12 years ago
Michael Peter Christen	6ec02deec6	added new crawl attributes in crawl profile (not active yet)	12 years ago
Michael Peter Christen	0504b01bdc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	9413f77b65	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	a55e77a115	added twitter search heuristic	12 years ago
Michael Peter Christen	e54ac38095	- some corrections in usage of getFile() and getFileName() - added more attributes in json response writer according to yacy servlet	12 years ago
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	12 years ago
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	12 years ago
Michael Peter Christen	4634f0e626	fix for images_withalt	12 years ago
Michael Peter Christen	4d29f59a27	removed warnings	12 years ago
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	12 years ago
orbiter	d73fff0e0e	added solr field images_withalt_i	12 years ago
sixcooler	e78fe3f477	also do a clearcache on the solr-connector-caches	12 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	12 years ago
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	12 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	12 years ago
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	12 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	12 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	12 years ago
Michael Peter Christen	2ddc33646a	added new field for solr: url_paths_sxt url_parameter_i url_parameter_key_sxt url_parameter_value_sxt url_chars_i	12 years ago
Michael Peter Christen	75d5e3475d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
cominch	dc468dad01	add content control features for custom filter lists	12 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	12 years ago
orbiter	a3d5959981	Merge commit '65d49df865f60511d22d86fb15c33a082176e7ab'	12 years ago
Michael Peter Christen	4521d63c92	added boosts to solr search queries	12 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	12 years ago
reger	65d49df865	security fix: clear automtic password only if adminAccountForLocalhost=false to prevent remote access to protected pages after restart. if adminAccountForLocalhost=true leave automatic password unchanged so access from local host is granted but remote access is preventet from the 1st second.	12 years ago
orbiter	29171e2f6c	fixed generation of ontologies from index enumerations	12 years ago
orbiter	01a63ef595	redesign of YaCySchema and SolrDoc handling	12 years ago
orbiter	479bfca571	refctoring	12 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	12 years ago
Michael Peter Christen	ab6ec4ec52	added snippet computation to solr/rss and gsa result writer	12 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	12 years ago
Michael Peter Christen	653645c1cf	corrected solr query syntax	12 years ago
orbiter	716ea0cfe2	sorted the solr schema into mandatory and optional fields; reduced number of used field to reduce solr index size	12 years ago
orbiter	9b8c8c0f47	fix from gaston in http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909	12 years ago
Michael Peter Christen	a049761e0c	fixed double-check	12 years ago
Michael Peter Christen	f42a57cd7d	gsa format update	12 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	12 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	12 years ago
Michael Peter Christen	f00733186b	code simplifications	12 years ago
Michael Peter Christen	db0d438709	fix for http://bugs.yacy.net/view.php?id=206	12 years ago
orbiter	404b0aab09	refactoring in remote search and stub for remote node peer selection	12 years ago
orbiter	d7ea45f698	- get nice text_t values from metadata conversions that are stored into solr as fulltext search index. - added slow migration from old metadata to solr index entries: each entry from the old metadata is removed from that data structure and written into solr.	12 years ago
orbiter	99ef57f103	reduced sleep times	12 years ago
orbiter	780f8974e7	added ramaining iteration methods for solr in fulltext class	12 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	12 years ago
orbiter	cc47a0876e	reverted `bf55f69176` to have a fall-back option in case that memory problems as reported in http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901 for full-solr installation are too strong and we have to work with an 'small memory footprint' peer system.	12 years ago
Michael Peter Christen	0904afe8fb	added concurrent iterator methods to the solr connectors	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	bf55f69176	removed write methods to old metadata file type; all metadata now goes to solr	12 years ago
Michael Peter Christen	40c0856489	refactoring	12 years ago
Michael Peter Christen	06a78eecb7	code simplification	12 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	12 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	12 years ago
Michael Peter Christen	23226676c6	FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default.	12 years ago
Michael Peter Christen	d988ba50cf	added a very rudimentary, incomplete, non-verified GSA response writer for solr. Try this: http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10	12 years ago
Michael Peter Christen	aab0b680c3	- added xslt support for solr result formats. try i.e. http://localhost:8090/solr/select?q=:&start=0&rows=10&wt=xslt&tr=json.xsl - added servlet-side mime-type configuration for streamed servlets. this is used for the result formatters in solr result formats	12 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	12 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	12 years ago
Michael Peter Christen	da851c6071	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	bd4f03bc85	removed unused class	12 years ago
orbiter	39f8eb60c3	tried to prevent calls to bad-hack getSize() method and reduced overhead of that method a bit.	12 years ago
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	12 years ago
orbiter	2571e0d47a	removed unused classes	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
Michael Peter Christen	136fcb1ad9	refactoring	12 years ago
Michael Peter Christen	a12f693ec9	added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search.	12 years ago
Michael Peter Christen	bca4a16603	replaced the multivalue generic string field name suffix _ss by _txt because _ss is not part of the standard solr example schema.	12 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	12 years ago
Michael Peter Christen	3ce04cecf3	bad hack to prevent a bug appearing in solr	12 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	12 years ago
Michael Peter Christen	ef488a15f7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
sixcooler	76b037a20a	check content domain fix: search image/media should not show pages containing image/media search text should show all/text but image/media	12 years ago
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	12 years ago
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	12 years ago
Michael Peter Christen	315d83cfa0	cleanup	12 years ago
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	12 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	05a3ffd03a	patches to ensure that solr connectors are active ony if they have a solr object assigned and vice versa	13 years ago
orbiter	5a3c829872	embedded solr is only initiated if it is activated with IndexFederated_p.html	13 years ago
Michael Peter Christen	97b7bcf2a6	added a solr search index - by default, a (empty) solr storage instance is created at SEGMENTS/solr_36 - the index is written if in /IndexFederated_p.html the flag "embedded solr search index" is switched on - a standard solr query interface is available now with a new servlet at http://127.0.0.1:8090/solr/select To test this, do the following: - switch to webportal mode - switch on the feature as described - do a crawl. this fills the solr index. The normal YaCy search will NOT work now! - do a solr query, like: http://127.0.0.1:8090/solr/select?q=: http://127.0.0.1:8090/solr/select?q=text_t:Help play with different search fields as you can see in /IndexFederated_p.html You can use the standard solr query attributes as described in http://wiki.apache.org/solr/SearchHandler	13 years ago
orbiter	c00a3cf74d	less usage of generic logger to avoid logger generation overhead	13 years ago
orbiter	e76159040b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
Michael Peter Christen	58e7d1952f	reduction of logging to prevent too much IO caused be logging	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	c7afa8bc48	using SwitchboardConstants for solr attributes	13 years ago
orbiter	c6d8950651	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	d09d9f2364	filter old peers from bootstrap (now stronger: 60 minutes instead of 240).	13 years ago
Michael Peter Christen	b0c408788b	made class methods static where possible	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
orbiter	fc0f9543fe	More SentenceReader cleanup	13 years ago
orbiter	d4291ac1f3	more tolerance when creating solar document	13 years ago
orbiter	78fc3cf8f8	refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class.	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	de903a53a0	parser refactoring & hacks	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	7249d9c9de	bugfix for concurrent seed loader	13 years ago
Michael Peter Christen	c72d3b12cd	concurrently initialize the seed list during p2p network bootstrap	13 years ago
Michael Peter Christen	1825f165b8	better integration of blacklist according to use case	13 years ago
Michael Peter Christen	c18fa9fa75	Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
reger	067728bccc	add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)	13 years ago
Michael Peter Christen	03280fb161	removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index.	13 years ago
Michael Peter Christen	508a81b86c	added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field.	13 years ago
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
Michael Peter Christen	0294a53459	- add canonical field only if requested by solr schema - remove canonical url from in/outbound urls if present	13 years ago
Michael Peter Christen	3fd4a01286	added option to record urls that are forwarded to the solr index	13 years ago
Michael Peter Christen	96aeb127e3	generalized localhost naming. this is also a preparation for a better IPv6 implementation.	13 years ago
Michael Peter Christen	77f795756c	fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr.	13 years ago
Michael Peter Christen	8dd469b9dd	added option to configure the autocommit delay time of solr on-the-fly	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	fad3b14813	added jetty libraries, needed for future use as web server and as application server for the solr search interface	13 years ago
Michael Peter Christen	a38b0a2c46	extended embedded solr tests to ensure that it will be usable within a jetty instance	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	a5eb91fa60	refactoring	13 years ago
Michael Peter Christen	1be0025a9c	- added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr	13 years ago
Michael Peter Christen	e12bb254b4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	3f55dc7c1e	- added solr core and libraries that solr needs (lucene is missing, will follow later) - added embedded solr connector which can connect to solr programmatically (without using a server in between)	13 years ago
Michael Peter Christen	786be7d175	better integration of RDFaParser	13 years ago
Michael Peter Christen	0752983fbd	- automatic periodic saving of triplestore - transaction-safe storage of triplestore	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	64c0268b2b	show triplestore metadata in yacydoc and viewfile	13 years ago
cominch	a95127c9af	Triplestore: initalize per-user triplestores	13 years ago
Michael Peter Christen	e89747bb67	- added automated generation of vocabularies from url stubs - added clear of all terms for vocabularies - added deletion of vocabularies	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	5fc6524ca8	- moved triple store to net.yacy.cora.lod (should be generalized there later - added abstract add, delete, get methods in the triplestore - added generation of triples after auto-annotation - migrated all MultiProtocolURI objects to DigestURI in the parser since the url hash is needed as subject value in the triples in the triple store	13 years ago
Michael Peter Christen	4ee6fb1de9	added missing blacklist dht cache storage (maybe due to mistakes in cherry picking)	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
Roland 'Quix0r' Haeder	af5a597e47	Scroogle is not comming back, remove dead code Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
cominch	65c5826d93	bugfix Conflicts: source/net/yacy/document/parser/augment/AugmentParser.java	13 years ago
Michael Peter Christen	cde20911bb	saved a bit more ram using UTF8 String compression for OpenGeoDB and Geonames data files.	13 years ago
Michael Peter Christen	2280a7b276	- changed initialization order to prefer allocation of memory for table files first - bugfixes in memory amount calculation	13 years ago
Michael Peter Christen	0746308bc2	only the metadata tables shall be able to use the tail cache	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	dd14b19c26	lazy initialization of block rank table ... only normal web search uses this. When interactive search or location search is used, the block rank is switched off	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	ab7107b34b	fixed RWIProcess queue limits: now discovering hidden results for mass result retrieval	13 years ago
Michael Peter Christen	b0095c8d3c	flush the compressor cache when a cleanup is done	13 years ago
Michael Peter Christen	a61f44f9e4	lazy initialization of block rank table. this causes that the table is not initialized when there is no search is done. the effect is most strong if YaCy is started headless which causes no browser pop-up which otherwise would load the search page and therefore trigger the initialization of the table.	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago

... 2 3 4 5 6 ...

488 Commits (38f46eb33d3c2758c405ae83f9414e53e1e65d3c)