Michael Peter Christen
842faf96a2
fixed media search
12 years ago
Michael Peter Christen
93001586a0
removed warnings, removed too-fast pausing of crawls
12 years ago
Michael Peter Christen
8041742e48
added matching of path to query pattern
12 years ago
Michael Peter Christen
8b1c9cba3d
fixed a problem with non-terminating crawls
12 years ago
Michael Peter Christen
61a1d32356
fix to ftp client
12 years ago
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
12 years ago
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
12 years ago
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
12 years ago
Michael Peter Christen
12c0db20e5
fixed npe for surrogate import
12 years ago
Michael Peter Christen
52df6ee369
more logging
12 years ago
Michael Peter Christen
158732af37
automatically delete entries from the crawl profile list if crawl is
...
terminated.
12 years ago
Michael Peter Christen
15d1460b40
added information about the reason of pausing of crawls
12 years ago
Michael Peter Christen
2371ef031c
added solr faceted search support to YaCy search results
...
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
12 years ago
Michael Peter Christen
b30a7162fa
added more thread-renaiming for search processes
12 years ago
Michael Peter Christen
900445d8e9
set the thread name during solr queries to the solr query to get better
...
debugging options
12 years ago
Michael Peter Christen
d481abd087
added the visualization of error-urls to host browser
...
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
12 years ago
Michael Peter Christen
a15819fbec
fix for some interface problems
12 years ago
Michael Peter Christen
791e1dcfdf
when a new crawl is started, delete all entries about error-urls for
...
crawl-start domains
12 years ago
Michael Peter Christen
619bf7e875
fixed filetype modified for media types in text search
12 years ago
Michael Peter Christen
97f82994a6
automatically pause the crawler if there is a problem with solr
12 years ago
Michael Peter Christen
8fb370d9f8
renovated the way how search results are count. should be correct now...
12 years ago
Michael Peter Christen
7bec253bb0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
d88eb657fd
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
orbiter
354ef8000d
- added 'deleteold' option to crawler which causes that documents are
...
deleted which are selected by a crawl filter (host or subpath)
- site crawl used this option be default now
- made option to deleteDomain() concurrency
12 years ago
Michael Peter Christen
75dd706e1b
update to HostBrowser:
...
- time-out after 3 seconds to speed up display (may be incomplete)
- showing also all links from the balancer queue in the host list (after
the '/') and in the result browser view with tag 'loading'
12 years ago
Michael Peter Christen
e2c4c3c7d3
migration to solr 4.0.0
12 years ago
Michael Peter Christen
b764de424a
code cleanup
12 years ago
Michael Peter Christen
9330ad4838
- fixed the delete option in host browser
...
- added a delete method which can be used to delete a full subpath in
solr.
12 years ago
Michael Peter Christen
a63179f3f9
added the MIME attribute for the R tag in GSA search result writer
12 years ago
Michael Peter Christen
1168d09de8
more refactoring - integrated the code of SnippetProcess into
...
SearchEvent
12 years ago
Michael Peter Christen
6629e37685
tried to clean up the search process mess
12 years ago
Michael Peter Christen
c5f67a5d6d
fixed a problem with local search from solr results: now all results
...
from solr are shown (again)
12 years ago
Michael Peter Christen
f8f05ecba7
- added a delete button in host browser to delete a complete subpath
...
- removed storage of default collection name - default is now "user"
- made stacking of crawl start points concurrently
12 years ago
Michael Peter Christen
0716a24737
added more / all new crawl profile fields into crawl profile editor
12 years ago
Michael Peter Christen
4a14122ba7
in case that a crawl profile has a collection assigned, use the
...
collection to show a name in the web interface. This should prevent that
much too long names make the interface unusable.
12 years ago
Michael Peter Christen
0fe8be7981
enhaced data structures for balancer and latency computation which
...
should produce a bit better prognosis about forced waiting times.
12 years ago
Michael Peter Christen
ac9540dfb6
removed options for stopwords which are not used
12 years ago
Michael Peter Christen
ce3fed8882
added the Google Search Appliance (GSA) api interface to the main menu.
...
See:
https://developers.google.com/search-appliance/documentation/68/xml_reference#request_overview
12 years ago
Michael Peter Christen
b2ffd49817
less latency
12 years ago
Michael Peter Christen
0833937c1c
better balancing and duetime-cumputation also for no-delay intranet
...
hosts
12 years ago
Michael Peter Christen
c326aa8f67
disabled writing new entries to crawl stacks to prevent that a domain
...
with many documents block refreshing of the crawl queue
12 years ago
Michael Peter Christen
6905182d41
- fix for number of words log message
...
- adding meta:refresh also to crawler stack
12 years ago
Michael Peter Christen
c25d7bcb80
- added concurrency for robots.txt loading
...
- changed data model for domain counter
12 years ago
Michael Peter Christen
a94c537afc
fixed getSize() which can use the cache size while the crawl is running
12 years ago
Michael Peter Christen
96912c9471
enhancement to solr caching: consider that during a get() the document
...
is not in solr but the cache points out that a commit is needed to get
the document.
12 years ago
Michael Peter Christen
a87811bc38
more auto-commit calls when a search interface is opened, but not when a
...
search is done there to prevent blocking during search-time.
12 years ago
Michael Peter Christen
3d3d654e88
if a network configuration is choosed which does not allow DHT and no
...
P2P communication is in robinson mode) then some menu entries are
disabled which have no use in this mode.
12 years ago
Michael Peter Christen
2d9e577ad0
replaced the custom robots.txt loader by the standard http loader
12 years ago
Michael Peter Christen
799d71bc67
enhanced solr caching:
...
- increased cache size which is needed for longer solr commit time
- speed hacks on cache write code
12 years ago
Michael Peter Christen
a33e2742cb
- removed unnecessary synchronized and deadlock in crawler
...
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
12 years ago