Michael Peter Christen
5fd3b93661
added deletion of hosts during crawl start if deleteold option was given
12 years ago
Michael Peter Christen
d64445c3cb
because we have the inurl:<term> - searchmodifier, we don't actually
...
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
12 years ago
orbiter
b55ea2197f
- redesign of crawl start servlet
...
- for domain-limited crawls, the domain is deleted now by default before
the crawl is started
12 years ago
orbiter
1c66de4bd4
- removed scheduled crawling options in crawl start because it is
...
superfluous there; it can be changed in the scheduler servlet. It's also
confusing in the presence of the delete-option, which will be
implemented next.
- removed unused crawl start servlet
- some refactoring to make the time parser reusable
12 years ago
Michael Peter Christen
2e7219f9fd
removed hightlighting of search results within collections in GSA
...
interface
12 years ago
Michael Peter Christen
074dfd297b
added icons and a selection for hosts with urls pending for crawler or
...
with errors
12 years ago
Michael Peter Christen
f07e5fb553
release 1.2
12 years ago
Michael Peter Christen
4c4e0eece2
added new submenu 'Target Analysis' with three servlets which are useful
...
to analyse the target servers: robots.txt table, mass target analysis
and a regex tester
12 years ago
Michael Peter Christen
61995d508e
do the commit anyway before calling a search interface
12 years ago
Michael Peter Christen
842faf96a2
fixed media search
12 years ago
Michael Peter Christen
86ec199126
using a better file name
12 years ago
Michael Peter Christen
93001586a0
removed warnings, removed too-fast pausing of crawls
12 years ago
Michael Peter Christen
8041742e48
added matching of path to query pattern
12 years ago
Michael Peter Christen
8b1c9cba3d
fixed a problem with non-terminating crawls
12 years ago
Michael Peter Christen
61a1d32356
fix to ftp client
12 years ago
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
12 years ago
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
12 years ago
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
12 years ago
Michael Peter Christen
29fbbb49dc
better colors for host browser and corrected document count
12 years ago
Michael Peter Christen
12c0db20e5
fixed npe for surrogate import
12 years ago
Michael Peter Christen
6244b084cd
fixed wrong order of result count values
12 years ago
Michael Peter Christen
631b08e7e2
update to HostBrowser
12 years ago
Michael Peter Christen
51f420e4f5
removed location search because it is only working in special cases
12 years ago
Michael Peter Christen
52df6ee369
more logging
12 years ago
Michael Peter Christen
158732af37
automatically delete entries from the crawl profile list if crawl is
...
terminated.
12 years ago
Michael Peter Christen
15d1460b40
added information about the reason of pausing of crawls
12 years ago
Michael Peter Christen
2371ef031c
added solr faceted search support to YaCy search results
...
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
12 years ago
Michael Peter Christen
b30a7162fa
added more thread-renaiming for search processes
12 years ago
Michael Peter Christen
900445d8e9
set the thread name during solr queries to the solr query to get better
...
debugging options
12 years ago
Michael Peter Christen
d481abd087
added the visualization of error-urls to host browser
...
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
12 years ago
Michael Peter Christen
a15819fbec
fix for some interface problems
12 years ago
Michael Peter Christen
791e1dcfdf
when a new crawl is started, delete all entries about error-urls for
...
crawl-start domains
12 years ago
Michael Peter Christen
c6a6f4c4e6
added a hack which makes the HostBrowser more performant when the given
...
host has a lot of urls. If the number of urls is > 1000, then the list
of documents is restricted to such which have no subpath, if the root
path is selected. However, this can cause a problem if no documents on
the root path exist but only on paths below that root path.
12 years ago
Michael Peter Christen
619bf7e875
fixed filetype modified for media types in text search
12 years ago
Michael Peter Christen
97f82994a6
automatically pause the crawler if there is a problem with solr
12 years ago
Michael Peter Christen
64ac2b7b7d
new submenu template
12 years ago
Michael Peter Christen
5e77801aac
update to web interface structure
12 years ago
Michael Peter Christen
8fb370d9f8
renovated the way how search results are count. should be correct now...
12 years ago
Michael Peter Christen
7bec253bb0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
d88eb657fd
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
orbiter
354ef8000d
- added 'deleteold' option to crawler which causes that documents are
...
deleted which are selected by a crawl filter (host or subpath)
- site crawl used this option be default now
- made option to deleteDomain() concurrency
12 years ago
Michael Peter Christen
19d1f474ce
host browser now shows also number of pending files per subdirectory +
...
bugfixes
12 years ago
Michael Peter Christen
75dd706e1b
update to HostBrowser:
...
- time-out after 3 seconds to speed up display (may be incomplete)
- showing also all links from the balancer queue in the host list (after
the '/') and in the result browser view with tag 'loading'
12 years ago
Michael Peter Christen
e2c4c3c7d3
migration to solr 4.0.0
12 years ago
Michael Peter Christen
b764de424a
code cleanup
12 years ago
Michael Peter Christen
69aa39d664
update to libraries required by solr 4.0.0
12 years ago
Michael Peter Christen
9330ad4838
- fixed the delete option in host browser
...
- added a delete method which can be used to delete a full subpath in
solr.
12 years ago
Michael Peter Christen
a63179f3f9
added the MIME attribute for the R tag in GSA search result writer
12 years ago
Michael Peter Christen
40df2fd193
added the host browser as link to search results. that means you can
...
select a browsing position after a search is done on the search results.
12 years ago
Michael Peter Christen
1168d09de8
more refactoring - integrated the code of SnippetProcess into
...
SearchEvent
12 years ago