yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	1735dbc9d9	enhanced image search: bugfixes and performance enhancements	10 years ago
Michael Peter Christen	ebd0be2cea	fixes and speed updates for search process	10 years ago
Michael Peter Christen	7611bf79bd	Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Conflicts: locales/ru.lng	10 years ago
Michael Peter Christen	524bedc00a	fixed text in startup tray icon and added shutdown icon during shutdown	10 years ago
Michael Peter Christen	e87dc08c0d	set the correct fail time in error docs	10 years ago
Michael Peter Christen	a7dd89c4de	changed method to write the citation index: do not catch up references during document parsing; instead use the same references that would also be written into the webgraph. That should cause that the webgraph and the citation index express the exact same semantic.	10 years ago
orbiter	f318d7c285	enhanced date-ordered ranking	10 years ago
reger	a6891ff7f8	fix Querygoal.parse exception on +/-null-term covers http://mantis.tokeek.de/view.php?id=452	10 years ago
orbiter	a65df4ce7e	do not push noindex errors into log if in intranet mode. noindex attributes are attached to artificial constructed index.html files which list directories. Such files are naturally rejected by the crawler and should not appear in the error log because these files are part of the construction of file crawlers and confuse users if they see them in the error log.	10 years ago
Marc Nause	2af56fa37d	Improved UPnP. (still not perfect) ) set HTTPS port if enabled ) improved data structures (may not be final) *) moved UPnP to own package	10 years ago
orbiter	d68438c3d9	make sure that the postprocessing background thread never dies by any exception	10 years ago
reger	e88537522d	allow single quote " ' " in query see http://mantis.tokeek.de/view.php?id=379 -add QueryGoal test case for this	10 years ago
orbiter	487021fb0a	snippet computation update	10 years ago
orbiter	927aaa95a6	concurrency bugfix	10 years ago
reger	7584352e7b	use more predefined Solr query parameter constants - use CommonParams and DisMaxParams constants - fix typo in get sort parameter - getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface	10 years ago
reger	f9db5dd6c5	reduce doublecontent check document (prevent out of memory) see http://mantis.tokeek.de/view.php?id=437 test result (concurrency=7) 2000 docs = eom always 1000 docs = eom always 100 docs = eom never chosen -> 200 docs (eom not encountered during test with 1GB mem setting)	10 years ago
reger	a8508417d1	catch NPE during crawl (OAI import) - condenseDocument mime=null (allowed) - collectionconfiguration responseheader = null (allowed)	10 years ago
Michael Peter Christen	6344718f8b	reducing the concurrent query stack size and reduced concurrency of postprocessing to avoid OOM situations	10 years ago
Michael Peter Christen	c465b791af	typo	10 years ago
Michael Peter Christen	191ec8c82a	added concurrency to postprocess rewrite process	10 years ago
Michael Peter Christen	a1e8bdd5e9	log ppm instead of docs/second	10 years ago
Michael Peter Christen	cc0ded7abd	set process type of web graph according to fields as defined in the schema	10 years ago
Michael Peter Christen	12fb9d7cd1	log postprocessing constraints in case that postprocessing is not performed	10 years ago
Michael Peter Christen	338f574bdc	no sorting if http/www unique fields are not demanded (makes query faster) and some code restrucuring	10 years ago
Michael Peter Christen	0ceeceb35e	more logic on Solr queries; usage of the query terms in posprocessing, saving one query for double document detection now per document	10 years ago
orbiter	4099296b45	added new classes which shall reduce call overhead to Solr (stub)	10 years ago
orbiter	3491ab4c38	removed unused images from webgraph edge computation	10 years ago
orbiter	2371d6b8db	target linktexts must be string to enable search facets on these fields	10 years ago
Michael Peter Christen	001e05bb80	do not store failure of loading of robots.txt into the index as a fail document	10 years ago
Michael Peter Christen	05d58e4df0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	10 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	10 years ago
orbiter	738989aab7	reverted commit `f94c91315b` because the webgraph has not enough performance for that	10 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	10 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	10 years ago
Michael Peter Christen	f94c91315b	if the webgraph is used, then use it also for reference computation to avoid contradictions with references_i in the collection index.	10 years ago
Michael Peter Christen	6e1dc444c3	added a snippet test function in ViewFile: you can now search for a specific word on the document; the servlet returns the snippet in the same way as it would be shown in a search result.	10 years ago
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	10 years ago
Michael Peter Christen	504327b15c	fix for condition for writing the webgraph	10 years ago
Michael Peter Christen	542c20a597	changed handling of crawl profile field crawlingIfOlder: this should be filled with the date, when the url is recognized as to be outdated. That field was partly misinterpreted and the time interval was filled in. In case that all the urls which are in the index shall be treated as outdated, the field is filled now with Long.MAX_VALUE because then all crawl dates are before that date and therefore outdated.	10 years ago
Michael Peter Christen	4eec1a7452	refactoring (change Metadata name of load time data structure to avoid confusion with Node data which is also called metadata)	10 years ago
reger	f96cfdc84d	prevent array out of bound exception on getRankingProfile(x) on faulty &profileNr= query parameter	10 years ago
reger	a2cb366b25	Combine /heuristic search modifier with opensearch configured targets - with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid) - this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches - the index.html searchoption text adjusted to be displayed only if option configured - add Archive-It to predefined systems	10 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	10 years ago
Michael Peter Christen	bf1b6b93e7	do not write CR values to webgraph if no CR values are computed	10 years ago
Michael Peter Christen	d07cdd8c3b	added SolrCloud access mode and configuration	10 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	10 years ago
Michael Peter Christen	b5fc2b63ea	removed exist() retrieval functions from error cache and replaced it with metadata retrieval from connectors directly. This should cause better usage of the cache. Automatically increase the metadata cache if more memory is available.	11 years ago
Michael Peter Christen	62c72360ee	cleanup of checkAcceptanceInitially in CrawlStacker, should avoid double-calling of solr	11 years ago
Michael Peter Christen	b5d78ba156	reduced number of solr queries during crawling	11 years ago
Michael Peter Christen	fd87fa1613	removed more unnecessary exist-checks in ErrorCache	11 years ago
Michael Peter Christen	f2b476e08b	don't do a double check to solr for failed documents if they are not written to solr	11 years ago
orbiter	dab9a0786a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	51bf5c85b0	Renamed the transmission cloud to buffer in dispatcher since the name 'cloud' was a bad idea. Changed also the accumulation process for peer targets so that every dht chunk is not assigned the set of redundant targets but they are assigned to redundant targets individually. This enhances the granularity of the target accumulation and should enhance the efficiency of the process. Finally the dht protocol client was enriched with the ability to remove the 'accept remote index' flag from peers or remove peers completely if they do not answer at all.	11 years ago
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	11 years ago
Michael Peter Christen	b0d941626f	fixed bugs in canonical, robots and title/description unique calculation	11 years ago
reger	d9472d043a	cleanup older unused classes	11 years ago
reger	665e12f88e	move startup time from old serverCore to switchboard (most used here) to make servercore eventually obsolete.	11 years ago
reger	336425912a	remove unused localSearchThread from SearchEvent	11 years ago
Michael Peter Christen	1092e798a5	fixed double content postprocessing	11 years ago
orbiter	59160984cc	timeline performance update	11 years ago
orbiter	2073e69034	fix for long periods in timeline	11 years ago
Michael Peter Christen	09dcdb9b19	update to solr 4.9.0	11 years ago
Michael Peter Christen	1cd4b2e8be	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
reger	431a5f9c4e	added test case for TextSnippet, removed obsolete/unused parameter and reference to MediaSnippet	11 years ago
Michael Peter Christen	5b94a257ce	no timeout for large reference collections	11 years ago
Michael Peter Christen	f5b817bac4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	a5707cd2eb	enable proper Author navigator - author facet is based on omitted author_sxt field - adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?) - add check for querymodifier author in searchevent	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
orbiter	fec673c9d1	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	c59da9fe7a	added access tracker log reader stub	11 years ago
Michael Peter Christen	36e623d8bf	enhanced metadata enrichment for media file type search: - Web servers may now deliver YaCy-specific http header field with a title and keywords. The new http header fields are: X-YaCy-Media-Title - to be used for media (image, audio, video) titles X-YaCy-Media-Keywords - to be used for media (image, audio, video) keywords - both fields are written to document fields title and keywords and are searched also during image search. - to make the usage of arbitrary http header fields (including this new fields) possible in the /api/push_p.json servlet, a new POST argument is also introduced to push http header fields. The new POST attribute is named "responseHeader-X" (where X is the counter). It is allowed to use this attribute as multi-attribute several times, each can be filled with a http header line. - see /api/push_p.html for examples	11 years ago
Michael Peter Christen	b893c42a0f	bugfix for image search	11 years ago
orbiter	0bbb5040b8	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	9d5d86cd03	Added filter query options to the ranking servlet /RankingSolr_p.html. Filter queries are not actually related to ranking, but user requests have pointed out that specific boost queries to move results to the end of the result list are not sufficient. Such boost filters may be better executed as actual filter and therefore such a filter can now be statically applied to every search request. A typical use could be the expression "http_unique_b:true AND www_unique_b:true" which uses the recently introduced fields http_unique_b and www_unique_b which are true only for one of the alternatives with/without http(s) and with/without prefix 'www.' in host names.	11 years ago
Michael Peter Christen	d2151857f1	Added collection navigation: The collection field (can be filled i.e. in Crawl Start) can be used to add categories to YaCy index entries. The usage of that field was restricted to solr searches and post argument filters as implemented in commit `f7571386a3`. This commit extends collections to a full navigation option in the standard YaCy search interface. The field is not active by default but can be activated easily in the /ConfigSearchPage_p.html servlet (just check the 'Collection' facet field). Collections can now be used for (at least) two purposes: - to provide search tenants (through post argument collection) - to provide self-made category navigation Search requests may now have (independently from switched on or off collection facet) a "collection:<collection-name>" modifier attached; firthermore collection names may use disjunctions using the '\|' pipe symbol. For example, this is a valid search request: www collection:user\|proxy	11 years ago
Michael Peter Christen	74c249288a	added a push api to make it possible to upload files directly without crawling to the YaCy indexer. Files are uploaded using POST multipart requests; multiple file uploads are possible as well. Each file has attached the file date and mime type which is used to get the right parser for the submitted data. Also an url is submitted which is assigned to the document. The CrawlSwitchboard has a new option for default Crawl Profiles which are assigned dynamically from the new push interface.	11 years ago
Michael Peter Christen	ba6ffddefc	refactoring	11 years ago
Michael Peter Christen	0c324d735c	NPE fix for postprocessing without term index	11 years ago
Michael Peter Christen	922979aae1	added option to prefer http over https in unique-protocol ranking	11 years ago
Michael Peter Christen	b3b174e2b8	fixed webgraph postprocessing and status display in Crawler_p servlet	11 years ago
Michael Peter Christen	f23c4142e0	added option to configure a custom user agent within allip networks	11 years ago
Michael Peter Christen	8ad41a882c	fixed several problems with postprocessing: - unique-postprocessing was destroying results from other postprocessings; removed cross-updates as they had been not necessary - unique-postprocessing did not restrict on same protocol - inefficient concurrent update cache was redesigned completely - increased limits for concurrent blocking queues to prevent early time-out	11 years ago
Michael Peter Christen	ff5b3ac84d	added new fields http_unique_b and www_unique_b which can be used for ranking to prefer urls containing a www subdomain or using the https protocol	11 years ago
Michael Peter Christen	f0db501630	better handling of ranking parameters and new default values for date navigation which is done using ranking in solr.	11 years ago
Michael Peter Christen	53948da7d0	tried to make last_modified recognition smarter	11 years ago
Michael Peter Christen	6634b5b737	debug code for index distribution testing	11 years ago
orbiter	97983ba89f	fixed generics warnings for generic array instantiation that appeared after migration to Java 7	11 years ago
sixcooler	830057d788	lower Segment-size (hope to get Segments of 10GB) see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5216&p=30036#p30034	11 years ago
orbiter	c028ae9b09	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
reger	e31493e139	"Use remote proxy for yacy" has no function, remove option and related config item see/fix bug http://mantis.tokeek.de/view.php?id=23 http://mantis.tokeek.de/view.php?id=189	11 years ago
orbiter	0d8072aa99	removed warnings	11 years ago
Michael Peter Christen	a1ac4c3b76	automatically clear graphics cache	11 years ago
reger	1432a817dd	respect "index media" switched off in CrawlStartExpert.html fix http://mantis.tokeek.de/view.php?id=64	11 years ago
Michael Peter Christen	4e734815e8	enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part.	11 years ago
Michael Peter Christen	e84e07399a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	8a7c68e4c7	content of surrogates/out never accessed (remove) After import the conent is never accessed but may take up a lot of disk space, also the getLoadedOAIServer (which lists the files in surrogate out) is not used. Making the surrogate.out obsolete. Removed keeping of xmls after import.	11 years ago
Michael Peter Christen	229f2248b8	added configuration option for maxmimum load and minimum ram for postprocessing	11 years ago
orbiter	8e5ce7cd51	fixed a situation where finished crawls had not been detected.	11 years ago

1 2 3 4 5 ...

1020 Commits (a67a46541586a7bf829f9445759d6748b8cfc54f)