yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	d3e71ed070	fixes for searches when initialization of large autotagging libraries have not been finished	10 years ago
Michael Peter Christen	65125439fe	added query modifier 'on'. This makes it possible to search for date occurrences within the (web) page documents (not the document last-modified!). This works only if the solr field dates_in_content_sxt is enabled. A search request may then have the form "term on:<date>", like gift on:24.12.2014 gift on:2014/12/24 * on:2014/12/31 For the date format you may use any kind of human-readable date representation(!yes!) - the on:<date> parser tries to identify language and also knows event names, like: bunny on:eastern .. as long as the date term has no spaces inside (use a dot). Further enhancement will be made to accept also strings encapsulated with quotes.	10 years ago
Michael Peter Christen	7bfc5b80cb	added new options to vocabulary editor: - new switch 'isFacet' which causes that the usage of the vocabulary for search facets is enabled or disabled. This shall be used for large vocabularies sind searched in solr are extremely slow if facets for a large set of alternative terms are generated - new option to disable auto-enrichment from synonyms - new option to add synonyms from another column when importing from csv - automatically recognize double-occurrences in synonyms and bundling terms for such synonyms	10 years ago
Michael Peter Christen	ff728b4aa5	ignore url errors during search	10 years ago
Michael Peter Christen	8317914ce3	changed vocabulary navigator object type to TreeMap to get a specific order into the vocabularies. This is now lexicographic which is not so much random as a hashed order	10 years ago
Michael Peter Christen	30276a2b48	prevent that a local Solr search and a local RWI search are running concurrently. When a RWI search result is flushed into the result set, id does Solr Queries (which replaced the old-style Metadata Queries) and they are possibly running concurrently to a previously startet Solr search. Both methods may block each other with IO. To enhance the speed, they are now serialized. Because the Solr search results may result in better results using the more advanced and configurable Ranking methods, this result is preverred over the RWI search result. However, remote RWI search results are still feeded concurrently into the search result as well.	10 years ago
reger	de56266bcb	remove redundant toLower for topwords	10 years ago
reger	ef5dc68313	include domtype to searcheventcache id to differenciate between local / global events for reuse of cached events fix for http://mantis.tokeek.de/view.php?id=493	10 years ago
Michael Peter Christen	c67c5c0709	added new solr schema fields which record the occurences of vocabulary matchings. These matches can be used for result boosting, i.e. if a document contains words from a specific vocabulary, boost it.	10 years ago
Michael Peter Christen	7e1b0b6712	fix for wildcard patch in search queries	10 years ago
Michael Peter Christen	5c97ecb30f	fix of bad query generation for search facets	10 years ago
Michael Peter Christen	3073c69aee	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	6491270b3a	large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing.	10 years ago
reger	8b1ce49ee6	remove unused variable timeout	10 years ago
reger	ffa7c7116f	better fix for NPE in image search replace `8931e14514`	11 years ago
Michael Peter Christen	f1032fb8fe	more enhancements to image search in case that a restriction to a single domain is done	11 years ago
Michael Peter Christen	475125f9d7	hack to get more results when doing a remote site search	11 years ago
Michael Peter Christen	81f9b34da7	increaesed ability ot search for all images on a single server within the p2p remote search	11 years ago
reger	b5e0f70197	- remove repositoryPath post from ConfigBasic (obsolete) - remove static snippetComputationTime from ResultEntry (not used)	11 years ago
reger	8931e14514	fix NPE in image search	11 years ago
Michael Peter Christen	1735dbc9d9	enhanced image search: bugfixes and performance enhancements	11 years ago
Michael Peter Christen	ebd0be2cea	fixes and speed updates for search process	11 years ago
Michael Peter Christen	7611bf79bd	Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Conflicts: locales/ru.lng	11 years ago
reger	a6891ff7f8	fix Querygoal.parse exception on +/-null-term covers http://mantis.tokeek.de/view.php?id=452	11 years ago
reger	e88537522d	allow single quote " ' " in query see http://mantis.tokeek.de/view.php?id=379 -add QueryGoal test case for this	11 years ago
reger	7584352e7b	use more predefined Solr query parameter constants - use CommonParams and DisMaxParams constants - fix typo in get sort parameter - getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface	11 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	11 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	11 years ago
Michael Peter Christen	6e1dc444c3	added a snippet test function in ViewFile: you can now search for a specific word on the document; the servlet returns the snippet in the same way as it would be shown in a search result.	11 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	11 years ago
reger	336425912a	remove unused localSearchThread from SearchEvent	11 years ago
orbiter	59160984cc	timeline performance update	11 years ago
Michael Peter Christen	1cd4b2e8be	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
reger	431a5f9c4e	added test case for TextSnippet, removed obsolete/unused parameter and reference to MediaSnippet	11 years ago
Michael Peter Christen	f5b817bac4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	a5707cd2eb	enable proper Author navigator - author facet is based on omitted author_sxt field - adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?) - add check for querymodifier author in searchevent	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
orbiter	fec673c9d1	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	c59da9fe7a	added access tracker log reader stub	11 years ago
Michael Peter Christen	b893c42a0f	bugfix for image search	11 years ago
orbiter	0bbb5040b8	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	9d5d86cd03	Added filter query options to the ranking servlet /RankingSolr_p.html. Filter queries are not actually related to ranking, but user requests have pointed out that specific boost queries to move results to the end of the result list are not sufficient. Such boost filters may be better executed as actual filter and therefore such a filter can now be statically applied to every search request. A typical use could be the expression "http_unique_b:true AND www_unique_b:true" which uses the recently introduced fields http_unique_b and www_unique_b which are true only for one of the alternatives with/without http(s) and with/without prefix 'www.' in host names.	11 years ago
Michael Peter Christen	d2151857f1	Added collection navigation: The collection field (can be filled i.e. in Crawl Start) can be used to add categories to YaCy index entries. The usage of that field was restricted to solr searches and post argument filters as implemented in commit `f7571386a3`. This commit extends collections to a full navigation option in the standard YaCy search interface. The field is not active by default but can be activated easily in the /ConfigSearchPage_p.html servlet (just check the 'Collection' facet field). Collections can now be used for (at least) two purposes: - to provide search tenants (through post argument collection) - to provide self-made category navigation Search requests may now have (independently from switched on or off collection facet) a "collection:<collection-name>" modifier attached; firthermore collection names may use disjunctions using the '\|' pipe symbol. For example, this is a valid search request: www collection:user\|proxy	11 years ago
Michael Peter Christen	f0db501630	better handling of ranking parameters and new default values for date navigation which is done using ranking in solr.	11 years ago
Michael Peter Christen	4e734815e8	enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part.	11 years ago
reger	727dfb5875	refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex	11 years ago
Michael Peter Christen	da86f150ab	- added a new Crawler Balancer: HostBalancer and HostQueues: This organizes all urls to be loaded in separate queues for each host. Each host separates the crawl depth into it's own queue. The primary rule for urls taken from any queue is, that the crawl depth is minimal. This produces a crawl depth which is identical to the clickdepth. Furthermorem the crawl is able to create a much better balancing over all hosts which is fair to all hosts that are in the queue. This process will create a very large number of files for wide crawls in the QUEUES folder: for each host a directory, for each crawl depth a file inside the directory. A crawl with maxdepth = 4 will be able to create 10.000s of files. To be able to use that many file readers, it was necessary to implement a new index data structure which opens the file only if an access is wanted (OnDemandOpenFileIndex). The usage of such on-demand file reader shall prevent that the number of file pointers is over the system limit, which is usually about 10.000 open files. Some parts of YaCy had to be adopted to handle the crawl depth number correctly. The logging and the IndexCreateQueues servlet had to be adopted to show the crawl queues differently, because the host name is attached to the port on the host to differentiate between http, https, and ftp services.	11 years ago
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	11 years ago
Michael Peter Christen	cbdfef7ce1	changed protocol facet to show also all other counts if one facet is selected	11 years ago

1 2 3 4 5 ...

351 Commits (ac61a398285ee94cc8c5b4502561fb5c9ac7f9b4)