yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	2645dc816a	added warning for not well-formed postprocessing queries	11 years ago
Michael Peter Christen	6d3d4c4ea6	changed the concurrent enumeration of query results in such a way that it is now possible to get the results in two steps: - first retrieve all IDs as given for a query - then retieve each document individually This was necessary for very large result sets where a query may run for hours and is possibly terminated by a solr-internal timeout. This occurs regulary during postprocessing and therefore this commit may fix unwanted postprocessing terminations.	11 years ago
Michael Peter Christen	ad35d9294f	added a 'stats' table which records some peer statistics twice every hour. The table can be shown with http://localhost:8090/Tables_p.html?table=stats The entries have the following meaning: aM: activeLastMonth aW: activeLastWeek aD: activeLastDay aH: activeLastHour cC: countConnected (Active Senior) cD: countDisconnected (Passive Senior) cP: countPotential (Junior) cR: count of the RWI entries cI: size of the index (number of documents) The entry keys are abbreviated to reduce the space in the table as the name is written again for every row. This is the beginning of a 'yacystats' micro-alternative als built-in function in YaCy. Graphics may follow after some time if enough test data is available.	11 years ago
reger	8284ea751a	catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile found a situation after crash (reboot) with existing running semaphore but YaCy not running. Ping generated exception which finally deleted the conf file (during pre-read procedure) - change to ping (catch exception solved it) - additionally removed delete yacy.conf file (if needed we need to make a backup)	11 years ago
reger	ffa7c7116f	better fix for NPE in image search replace `8931e14514`	11 years ago
Michael Peter Christen	f1032fb8fe	more enhancements to image search in case that a restriction to a single domain is done	11 years ago
Michael Peter Christen	475125f9d7	hack to get more results when doing a remote site search	11 years ago
Michael Peter Christen	81f9b34da7	increaesed ability ot search for all images on a single server within the p2p remote search	11 years ago
reger	b5e0f70197	- remove repositoryPath post from ConfigBasic (obsolete) - remove static snippetComputationTime from ResultEntry (not used)	11 years ago
reger	8931e14514	fix NPE in image search	11 years ago
Michael Peter Christen	1735dbc9d9	enhanced image search: bugfixes and performance enhancements	11 years ago
Michael Peter Christen	ebd0be2cea	fixes and speed updates for search process	11 years ago
Michael Peter Christen	7611bf79bd	Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Conflicts: locales/ru.lng	11 years ago
Michael Peter Christen	524bedc00a	fixed text in startup tray icon and added shutdown icon during shutdown	11 years ago
Michael Peter Christen	e87dc08c0d	set the correct fail time in error docs	11 years ago
Michael Peter Christen	a7dd89c4de	changed method to write the citation index: do not catch up references during document parsing; instead use the same references that would also be written into the webgraph. That should cause that the webgraph and the citation index express the exact same semantic.	11 years ago
orbiter	f318d7c285	enhanced date-ordered ranking	11 years ago
reger	a6891ff7f8	fix Querygoal.parse exception on +/-null-term covers http://mantis.tokeek.de/view.php?id=452	11 years ago
orbiter	a65df4ce7e	do not push noindex errors into log if in intranet mode. noindex attributes are attached to artificial constructed index.html files which list directories. Such files are naturally rejected by the crawler and should not appear in the error log because these files are part of the construction of file crawlers and confuse users if they see them in the error log.	11 years ago
Marc Nause	2af56fa37d	Improved UPnP. (still not perfect) ) set HTTPS port if enabled ) improved data structures (may not be final) *) moved UPnP to own package	11 years ago
orbiter	d68438c3d9	make sure that the postprocessing background thread never dies by any exception	11 years ago
reger	e88537522d	allow single quote " ' " in query see http://mantis.tokeek.de/view.php?id=379 -add QueryGoal test case for this	11 years ago
orbiter	487021fb0a	snippet computation update	11 years ago
orbiter	927aaa95a6	concurrency bugfix	11 years ago
reger	7584352e7b	use more predefined Solr query parameter constants - use CommonParams and DisMaxParams constants - fix typo in get sort parameter - getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface	11 years ago
reger	f9db5dd6c5	reduce doublecontent check document (prevent out of memory) see http://mantis.tokeek.de/view.php?id=437 test result (concurrency=7) 2000 docs = eom always 1000 docs = eom always 100 docs = eom never chosen -> 200 docs (eom not encountered during test with 1GB mem setting)	11 years ago
reger	a8508417d1	catch NPE during crawl (OAI import) - condenseDocument mime=null (allowed) - collectionconfiguration responseheader = null (allowed)	11 years ago
Michael Peter Christen	6344718f8b	reducing the concurrent query stack size and reduced concurrency of postprocessing to avoid OOM situations	11 years ago
Michael Peter Christen	c465b791af	typo	11 years ago
Michael Peter Christen	191ec8c82a	added concurrency to postprocess rewrite process	11 years ago
Michael Peter Christen	a1e8bdd5e9	log ppm instead of docs/second	11 years ago
Michael Peter Christen	cc0ded7abd	set process type of web graph according to fields as defined in the schema	11 years ago
Michael Peter Christen	12fb9d7cd1	log postprocessing constraints in case that postprocessing is not performed	11 years ago
Michael Peter Christen	338f574bdc	no sorting if http/www unique fields are not demanded (makes query faster) and some code restrucuring	11 years ago
Michael Peter Christen	0ceeceb35e	more logic on Solr queries; usage of the query terms in posprocessing, saving one query for double document detection now per document	11 years ago
orbiter	4099296b45	added new classes which shall reduce call overhead to Solr (stub)	11 years ago
orbiter	3491ab4c38	removed unused images from webgraph edge computation	11 years ago
orbiter	2371d6b8db	target linktexts must be string to enable search facets on these fields	11 years ago
Michael Peter Christen	001e05bb80	do not store failure of loading of robots.txt into the index as a fail document	11 years ago
Michael Peter Christen	05d58e4df0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	11 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	11 years ago
orbiter	738989aab7	reverted commit `f94c91315b` because the webgraph has not enough performance for that	11 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	11 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	11 years ago
Michael Peter Christen	f94c91315b	if the webgraph is used, then use it also for reference computation to avoid contradictions with references_i in the collection index.	11 years ago
Michael Peter Christen	6e1dc444c3	added a snippet test function in ViewFile: you can now search for a specific word on the document; the servlet returns the snippet in the same way as it would be shown in a search result.	11 years ago
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	11 years ago
Michael Peter Christen	504327b15c	fix for condition for writing the webgraph	11 years ago
Michael Peter Christen	542c20a597	changed handling of crawl profile field crawlingIfOlder: this should be filled with the date, when the url is recognized as to be outdated. That field was partly misinterpreted and the time interval was filled in. In case that all the urls which are in the index shall be treated as outdated, the field is filled now with Long.MAX_VALUE because then all crawl dates are before that date and therefore outdated.	11 years ago

1 2 3 4 5 ...

980 Commits (025516f68243a50da1dd7dacb659402aa5b36e3f)