Michael Peter Christen
7c3de8b4cd
- fix for localhost detection
...
- added IPv6 patterns for localhost detection
12 years ago
Michael Peter Christen
34f8786508
removed dependency of vocabulary navigation from Jena and it's
...
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.
12 years ago
orbiter
712cc37c40
if maxFileSize < 0 then the file size limit is without limit.
12 years ago
orbiter
1f33c30d7b
re-integrating useForHost method (lost sometime?) to get the noProxy
...
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
12 years ago
reger
e2d499be9e
remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list).
12 years ago
Michael Peter Christen
118233a7e6
fix for bad xml in gsa result when doing a query with quotes
12 years ago
Michael Peter Christen
adfecc6ba8
more robustness during shutdown
12 years ago
Michael Peter Christen
d4bfe9339e
Brute-force attempt to start solr in case of a memory problem.
...
I don't actually know if this is correct. It is a desperate try to get
YaCy running on production servers which must get alive even with
strange hacks like this. This is also related to a forum posting in
http://forum.yacy-websuche.de/viewtopic.php?t=4528&p=27135#p27135
12 years ago
Michael Peter Christen
8aa08261a7
update to Solr Boost handling
12 years ago
Michael Peter Christen
908ad2f174
Added a new servlet to configure the solr ranking using field boosts
12 years ago
Michael Peter Christen
a01e47b992
enhanced exists()-method for solr; should reduce a lot of IO during DHT
...
target selection
12 years ago
Michael Peter Christen
72f165d58b
added a Boost class which stores solr query boost values. The class can
...
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
12 years ago
reger
6cf33f899c
prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check)
12 years ago
Michael Peter Christen
acd98bebb7
improvements in GSA result writer
12 years ago
Michael Peter Christen
3de784c8dd
replaced more split and replaceAll missing pattern pre-compilation with
...
pre-compiled pattern
12 years ago
Michael Peter Christen
8fc3679c66
using more pre-compile pattern for split methods
12 years ago
Michael Peter Christen
d48e9788d2
enhanced search result processing behavior
...
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
12 years ago
Michael Peter Christen
d465773a37
- removed multi-add of documents (no used)
...
- inserted specialized code for size request
12 years ago
Michael Peter Christen
b7004043ea
- added a field cache for solr queries which call only for a single
...
value
- fixed a version conflict exception within a solr add request
12 years ago
Michael Peter Christen
efd2c4622d
added a new fail type attribute for the index to distinguish two
...
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).
12 years ago
Michael Peter Christen
a114bb23bb
- using edismax in gsa interface
...
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
12 years ago
Michael Peter Christen
d6b82840f8
added a feature to find similarities in documents.
...
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
12 years ago
Michael Peter Christen
f5ca5cea44
- added field options to all solr queries. This can be used to restrict
...
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
12 years ago
Michael Peter Christen
5fd3b93661
added deletion of hosts during crawl start if deleteold option was given
13 years ago
Michael Peter Christen
61a1d32356
fix to ftp client
13 years ago
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
13 years ago
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
13 years ago
Michael Peter Christen
2371ef031c
added solr faceted search support to YaCy search results
...
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
13 years ago
Michael Peter Christen
b30a7162fa
added more thread-renaiming for search processes
13 years ago
Michael Peter Christen
900445d8e9
set the thread name during solr queries to the solr query to get better
...
debugging options
13 years ago
Michael Peter Christen
d481abd087
added the visualization of error-urls to host browser
...
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
13 years ago
Michael Peter Christen
619bf7e875
fixed filetype modified for media types in text search
13 years ago
Michael Peter Christen
8fb370d9f8
renovated the way how search results are count. should be correct now...
13 years ago
Michael Peter Christen
75dd706e1b
update to HostBrowser:
...
- time-out after 3 seconds to speed up display (may be incomplete)
- showing also all links from the balancer queue in the host list (after
the '/') and in the result browser view with tag 'loading'
13 years ago
Michael Peter Christen
e2c4c3c7d3
migration to solr 4.0.0
13 years ago
Michael Peter Christen
a63179f3f9
added the MIME attribute for the R tag in GSA search result writer
13 years ago
Michael Peter Christen
c5f67a5d6d
fixed a problem with local search from solr results: now all results
...
from solr are shown (again)
13 years ago
Michael Peter Christen
ce3fed8882
added the Google Search Appliance (GSA) api interface to the main menu.
...
See:
https://developers.google.com/search-appliance/documentation/68/xml_reference#request_overview
13 years ago
Michael Peter Christen
a94c537afc
fixed getSize() which can use the cache size while the crawl is running
13 years ago
Michael Peter Christen
96912c9471
enhancement to solr caching: consider that during a get() the document
...
is not in solr but the cache points out that a commit is needed to get
the document.
13 years ago
Michael Peter Christen
799d71bc67
enhanced solr caching:
...
- increased cache size which is needed for longer solr commit time
- speed hacks on cache write code
13 years ago
Michael Peter Christen
a33e2742cb
- removed unnecessary synchronized and deadlock in crawler
...
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
13 years ago
orbiter
354f0d9acd
moved static method from ClusteredScoreMap to MapDataMining because it
...
was not used in the ClusteredScoreMap class but only in MapDataMining
13 years ago
Michael Peter Christen
8e1248ffe3
force a commit in advance of a search for the administrator to get most
...
recent results even if commit time is high and an indexing is ongoing.
13 years ago
Michael Peter Christen
3b48c78190
added an option to force a commit to solr.
...
may be used by a search front-end in case that the commitWithinMs time
is too short to get recently indexed documents.
13 years ago
orbiter
276dd6452b
removed warnings
13 years ago
Michael Peter Christen
ea11a1efea
fix for highlighting in gsa search
13 years ago
Michael Peter Christen
b7ac1da6a3
gsa results shall have only one title in metadata and that should be the
...
visible title in the <title>-tag
13 years ago
Michael Peter Christen
ea27d2e5f6
fixed more getSolrFieldName usages
13 years ago
Michael Peter Christen
ce0e5b1e17
- more refactoring / private methods
...
- fix for usage of custom solr field names
13 years ago