yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	12 years ago
Michael Peter Christen	46be4af5b9	Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'	12 years ago
Michael Peter Christen	c73a9bc654	Merge remote-tracking branch 'reger/master'	12 years ago
Michael Peter Christen	832eead998	Merge remote-tracking branch 'regerdev/master'	12 years ago
Michael Peter Christen	952e143580	FINALLY YaCy can now search for full strings using double- or singlequoted strings in the search query line!!!	12 years ago
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	12 years ago
cominch	2bb8f045cc	content control: use up-to-date definitions	12 years ago
Michael Peter Christen	5fd3b93661	added deletion of hosts during crawl start if deleteold option was given	12 years ago
Michael Peter Christen	d64445c3cb	because we have the inurl:<term> - searchmodifier, we don't actually need regular expressions as search attributes. They had now been removed from the advanced search page while they are still created internally. The filter is then expressed against solr as regular expression filter query. If the expression points out a selection of an specific protocol, host or filetype this is then translated into a facetted query.	12 years ago
orbiter	b55ea2197f	- redesign of crawl start servlet - for domain-limited crawls, the domain is deleted now by default before the crawl is started	12 years ago
orbiter	1c66de4bd4	- removed scheduled crawling options in crawl start because it is superfluous there; it can be changed in the scheduler servlet. It's also confusing in the presence of the delete-option, which will be implemented next. - removed unused crawl start servlet - some refactoring to make the time parser reusable	12 years ago
cominch	a67ff1c8ac	SMW Import: replaced JSON import routines with stable ones	12 years ago
reger	328ce0b297	fix: remove fixed individual testing IP (85.25.151.30 = server4you.de) from default/yacy.network.freeworld.unit	12 years ago
Michael Peter Christen	2e7219f9fd	removed hightlighting of search results within collections in GSA interface	12 years ago
Michael Peter Christen	074dfd297b	added icons and a selection for hosts with urls pending for crawler or with errors	12 years ago
cominch	d2a94cc55e	refactor package	12 years ago
cominch	05742b4562	remove old SMW importer which was part of the ymarks package	12 years ago
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	12 years ago
Michael Peter Christen	f07e5fb553	release 1.2	12 years ago
Michael Peter Christen	4c4e0eece2	added new submenu 'Target Analysis' with three servlets which are useful to analyse the target servers: robots.txt table, mass target analysis and a regex tester	12 years ago
Michael Peter Christen	61995d508e	do the commit anyway before calling a search interface	12 years ago
Michael Peter Christen	842faf96a2	fixed media search	12 years ago
Michael Peter Christen	86ec199126	using a better file name	12 years ago
Michael Peter Christen	93001586a0	removed warnings, removed too-fast pausing of crawls	12 years ago
Michael Peter Christen	8041742e48	added matching of path to query pattern	12 years ago
Michael Peter Christen	8b1c9cba3d	fixed a problem with non-terminating crawls	12 years ago
Michael Peter Christen	61a1d32356	fix to ftp client	12 years ago
Michael Peter Christen	5105256927	update to search result logging (this was a remaining issue from the solr 4.0.0 migration)	12 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	12 years ago
Michael Peter Christen	71ed8e5e07	bugfixes for crawler	12 years ago
Michael Peter Christen	29fbbb49dc	better colors for host browser and corrected document count	12 years ago
Michael Peter Christen	12c0db20e5	fixed npe for surrogate import	12 years ago
Michael Peter Christen	6244b084cd	fixed wrong order of result count values	12 years ago
Michael Peter Christen	631b08e7e2	update to HostBrowser	12 years ago
Michael Peter Christen	51f420e4f5	removed location search because it is only working in special cases	12 years ago
Michael Peter Christen	52df6ee369	more logging	12 years ago
Michael Peter Christen	158732af37	automatically delete entries from the crawl profile list if crawl is terminated.	12 years ago
Michael Peter Christen	15d1460b40	added information about the reason of pausing of crawls	12 years ago
Michael Peter Christen	2371ef031c	added solr faceted search support to YaCy search results added solr highlighting / YaCy snippets to YaCy search results - facets are now much more complete - facets are computed and searched much faster - snippet computation is done by solr if solr knows the snippet	12 years ago
Michael Peter Christen	b30a7162fa	added more thread-renaiming for search processes	12 years ago
Michael Peter Christen	900445d8e9	set the thread name during solr queries to the solr query to get better debugging options	12 years ago
Michael Peter Christen	d481abd087	added the visualization of error-urls to host browser - only visible for admins - a faceted search generates a huge list for all hosts in the host list - the faceted search algorithms had to be modified for that - within the browsing of the directory path, the error cause is written to the url which is presented as error-url - the errors are also accumulated for directory sums	12 years ago
Michael Peter Christen	a15819fbec	fix for some interface problems	12 years ago
Michael Peter Christen	791e1dcfdf	when a new crawl is started, delete all entries about error-urls for crawl-start domains	12 years ago
Michael Peter Christen	c6a6f4c4e6	added a hack which makes the HostBrowser more performant when the given host has a lot of urls. If the number of urls is > 1000, then the list of documents is restricted to such which have no subpath, if the root path is selected. However, this can cause a problem if no documents on the root path exist but only on paths below that root path.	12 years ago
Michael Peter Christen	619bf7e875	fixed filetype modified for media types in text search	12 years ago
Michael Peter Christen	97f82994a6	automatically pause the crawler if there is a problem with solr	12 years ago
Michael Peter Christen	64ac2b7b7d	new submenu template	12 years ago
Michael Peter Christen	5e77801aac	update to web interface structure	12 years ago

... 3 4 5 6 7 ...

9299 Commits (d70d99fab5a355ec7b03521315014bdc26d641a9) All Branches Search

9299 Commits (d70d99fab5a355ec7b03521315014bdc26d641a9)

All Branches