Michael Peter Christen
34f8786508
removed dependency of vocabulary navigation from Jena and it's
...
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.
13 years ago
reger
664499bb10
PerformanceQueues: disable input for hardcoded httpd performance values
13 years ago
Michael Peter Christen
9319b90d8a
- fixes for host navigation
...
- fixes for filetype navigation
- removed unused code
13 years ago
Michael Peter Christen
cb5cbec14d
distinguishing modified query string and original query string
13 years ago
Michael Peter Christen
fb0fa9a102
- fixed 'delete from subpath' during crawl start which deleted nothing;
...
now works;
- changed some crawl start html design details
13 years ago
orbiter
54e193a2b8
you can now search for '*' to get just ALL entries in the search index
...
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.
13 years ago
orbiter
7f5526e6ef
allow larger no-proxy expressions
13 years ago
reger
e80dfeca23
- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171 )
...
- blacklist test adding explicite response text "not blocked" if no blacklist match
13 years ago
Michael Peter Christen
4491072256
- clear the search cache when altering the solr boosts
...
- better positions for submit buttons
13 years ago
Michael Peter Christen
2b7d46bc1f
using a filter query for the site parameter in GSA api
13 years ago
Michael Peter Christen
10527e28ae
fix for wrong display of error urls in HostBrowser
13 years ago
Michael Peter Christen
5f5d66921e
patch for funny symbols in url paths (like tilde)
13 years ago
Michael Peter Christen
8aa08261a7
update to Solr Boost handling
13 years ago
Michael Peter Christen
908ad2f174
Added a new servlet to configure the solr ranking using field boosts
13 years ago
Michael Peter Christen
a598fb6227
renamed Ranking_p.html to RankingRWI_p.html
...
because there will be another Ranking servlet as well at next
13 years ago
Michael Peter Christen
72f165d58b
added a Boost class which stores solr query boost values. The class can
...
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
13 years ago
reger
bb20691d4f
fix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public users (as hostbrowser is now available in search results)
13 years ago
Michael Peter Christen
3de784c8dd
replaced more split and replaceAll missing pattern pre-compilation with
...
pre-compiled pattern
13 years ago
Michael Peter Christen
8fc3679c66
using more pre-compile pattern for split methods
13 years ago
Michael Peter Christen
d48e9788d2
enhanced search result processing behavior
...
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
13 years ago
Michael Peter Christen
eca68fa197
added debug code to crawler monitor
13 years ago
Michael Peter Christen
205f8b222b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter
c54cb85422
added link to
...
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
to the /RegexTest.html servlet
13 years ago
Michael Peter Christen
b7004043ea
- added a field cache for solr queries which call only for a single
...
value
- fixed a version conflict exception within a solr add request
13 years ago
Michael Peter Christen
bf42179982
introduced more structure in HostBrowser, table view, better counting,
...
distinguishing of error cases (fail/excluded)
13 years ago
Michael Peter Christen
4eab3aae60
removed overhead by preventing generation of full search results when
...
only the url is requested
13 years ago
Michael Peter Christen
a114bb23bb
- using edismax in gsa interface
...
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
13 years ago
Michael Peter Christen
d6b82840f8
added a feature to find similarities in documents.
...
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
13 years ago
Michael Peter Christen
f5ca5cea44
- added field options to all solr queries. This can be used to restrict
...
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
13 years ago
Michael Peter Christen
46be4af5b9
Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'
13 years ago
Michael Peter Christen
952e143580
FINALLY YaCy can now search for full strings using double- or
...
singlequoted strings in the search query line!!!
13 years ago
orbiter
5dfd6359cb
redesign of the QueryParams class: introduced QueryGoal which holds the
...
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
13 years ago
Michael Peter Christen
5fd3b93661
added deletion of hosts during crawl start if deleteold option was given
13 years ago
Michael Peter Christen
d64445c3cb
because we have the inurl:<term> - searchmodifier, we don't actually
...
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
13 years ago
orbiter
b55ea2197f
- redesign of crawl start servlet
...
- for domain-limited crawls, the domain is deleted now by default before
the crawl is started
13 years ago
orbiter
1c66de4bd4
- removed scheduled crawling options in crawl start because it is
...
superfluous there; it can be changed in the scheduler servlet. It's also
confusing in the presence of the delete-option, which will be
implemented next.
- removed unused crawl start servlet
- some refactoring to make the time parser reusable
13 years ago
Michael Peter Christen
2e7219f9fd
removed hightlighting of search results within collections in GSA
...
interface
13 years ago
Michael Peter Christen
074dfd297b
added icons and a selection for hosts with urls pending for crawler or
...
with errors
13 years ago
cominch
21df1ad9e0
update and generalization of the SMW import and content control routines
13 years ago
Michael Peter Christen
4c4e0eece2
added new submenu 'Target Analysis' with three servlets which are useful
...
to analyse the target servers: robots.txt table, mass target analysis
and a regex tester
13 years ago
Michael Peter Christen
61995d508e
do the commit anyway before calling a search interface
13 years ago
Michael Peter Christen
86ec199126
using a better file name
13 years ago
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
13 years ago
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
13 years ago
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
13 years ago
Michael Peter Christen
29fbbb49dc
better colors for host browser and corrected document count
13 years ago
Michael Peter Christen
6244b084cd
fixed wrong order of result count values
13 years ago
Michael Peter Christen
631b08e7e2
update to HostBrowser
13 years ago
Michael Peter Christen
51f420e4f5
removed location search because it is only working in special cases
13 years ago
Michael Peter Christen
15d1460b40
added information about the reason of pausing of crawls
13 years ago