yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	13 years ago
Michael Peter Christen	e54ac38095	- some corrections in usage of getFile() and getFileName() - added more attributes in json response writer according to yacy servlet	13 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	13 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	13 years ago
orbiter	67f2866cd0	small fixes	13 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	13 years ago
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	13 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	13 years ago
orbiter	482afed07c	reduced logging overhead (a bit)	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	801972fe6f	fix for url camel case parser and sentence reader	13 years ago
Michael Peter Christen	fbc1a2030d	fix for sitemap importer: can now also import very large sitemaps within small memory configurations	13 years ago
Michael Peter Christen	92731e5287	fix for sevenzip parser	13 years ago
Michael Peter Christen	8efc1c1078	- fixed a memory leak (or bad usage) during parsing/snippet fetch - more logging for errors	13 years ago
Michael Peter Christen	b1e7c11fba	fix for pattern matcher in html parser	13 years ago
Michael Peter Christen	b0c408788b	made class methods static where possible	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
orbiter	fc0f9543fe	More SentenceReader cleanup	13 years ago
orbiter	586bb0eb6a	Simplified SentenceReader (no more Reader inside..)	13 years ago
orbiter	7f851d62a7	replaced HashARC with SizeLimited Objects which are less costly	13 years ago
orbiter	78fc3cf8f8	refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class.	13 years ago
orbiter	bb8dcb4911	automatically adopt size of word cache to available memory	13 years ago
Michael Peter Christen	ad09b786bf	clean up parser data	13 years ago
Michael Peter Christen	276a66a793	Adding a limit of 1000 links that a parser shall store during indexing. A limit was necessary because some web pages have such huge numbers of links that it can easily cause a OOM just by the number of links. The quesion if the number of 1000 links is sufficient or too weak must be answered with the result of testing this feature.	13 years ago
Michael Peter Christen	de903a53a0	parser refactoring & hacks	13 years ago
Michael Peter Christen	1825f165b8	better integration of blacklist according to use case	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	508a81b86c	added solr field 'refresh_s' which stores the refresh url contained in the meta-refresh html header field.	13 years ago
Michael Peter Christen	f3167def64	do not fill the keywords with title content if keywords do not exist.	13 years ago
Michael Peter Christen	77f795756c	fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr.	13 years ago
Michael Peter Christen	dbdd697f4d	moved RDFaParser.xsl configuration file to defaults	13 years ago
Michael Peter Christen	786be7d175	better integration of RDFaParser	13 years ago
Michael Peter Christen	de3ef8ad73	removed unimportant warnings	13 years ago
Michael Peter Christen	24bbe359ca	integrate also geonames library files for less cities. these are more useful for tagging since less normal words are false-identified as location	13 years ago
Michael Peter Christen	223a5440ab	preventing that an empty pnd is inserted into the vocabularies	13 years ago
Michael Peter Christen	963f92ed9a	- merged files - changed behaviour of delete button in vocabulary edit - fixed size numbe in vocabulary listing	13 years ago
Michael Peter Christen	dd88d0ace2	more logging	13 years ago
Michael Peter Christen	94d54e2d91	added recognition of multi-word terms in vocabulary matching this makes the PND usable: it is now possible to recognize persons and navigate with a 'Persons' facet.	13 years ago
Michael Peter Christen	64c0268b2b	show triplestore metadata in yacydoc and viewfile	13 years ago
Michael Peter Christen	c2f0d16d2c	fixed vocabulary initialization	13 years ago
Michael Peter Christen	df3531f8d5	added the generation of virtual vocabularies using the pnd	13 years ago
Michael Peter Christen	a0f1decd82	- added loading of the dbpedia pnd triplestore in the dictionary loader - renamed the dictionary loader to knowledge loader - some refactoring in the library provider method names	13 years ago

1 2 3 4 5 ...

358 Commits (bc865ab816c8b59408a8d701752621585d09ebe6)