reger
92d9c56f9f
Merge origin/master into jetty
11 years ago
Michael Peter Christen
0db8e34625
enhanced webgraph processing
11 years ago
reger
effea4bca0
Merge origin/master into jetty
...
Conflicts:
source/net/yacy/cora/federate/solr/SolrServlet.java
11 years ago
sixcooler
2c2ebb0d92
tried some hardening in order not letting any Solr-Searchers open
11 years ago
Michael Peter Christen
a16534cb0a
tried to fix timeout and connection-lost problems when using an outside
...
solr.
11 years ago
Michael Peter Christen
9932c441c8
fixed a problem with Date fields parsing Solr results if a remote Solr
...
is attached.
11 years ago
sixcooler
94db054aff
memory-leak-fix: the DocListSearcher fires an query in its constructor
...
and it is highly recommend to close every SolrRequest.
Every Request, which is not closed leaves a Searcher with its Chaches an
can not be garbage-collectet.
11 years ago
reger
26bb1e37b7
implement core selection in SolrServlet
...
- making initcore() obsolete
11 years ago
Michael Peter Christen
5592ea57f0
hack to remove compiler warnings about deprecated classes. It would be
...
better to remove the deprecated usage but to do this the Solr core must
adopt the latest apache http core changes as well .. this is not our
fault.
11 years ago
orbiter
037cd0a57c
using the BinaryResponseWriter which is supported within the YaCy solr
...
servlet since YaCy 1.63. This is much more performant for the client
than using the XMLResponseWriter because parsing of XML data is very CPU
intensive. Older YaCy peers are still requested using the
XMLResponseWriter but the majority of YaCy peers already respond with
the binary writer. This makes remote searches much faster and less CPU
intensive.
11 years ago
reger
5c4a3d1c01
Merge origin/master into jetty
11 years ago
reger
8da75a4b0c
fix contentType definition for Solr html responswriter
...
from xml to html
(hint: value is currently not used, but is in SolrServlet)
11 years ago
Michael Peter Christen
1f0bfa8fec
added test to Base64Order (runs successfully!)
11 years ago
reger
f111f30ace
Merge origin/master into jetty
11 years ago
Michael Peter Christen
219d5934a4
fixed termination bug in Solr Connector
11 years ago
Michael Peter Christen
9d5895f643
enhanced and fixed postprocessing
11 years ago
Michael Peter Christen
f86fe90eda
enhanced mass storage speed to remote solr servers
11 years ago
Michael Peter Christen
6ed9821209
fixed several problems in solr connectors
11 years ago
Michael Peter Christen
191fd3d7e7
added an optimization option to HandleSet mass data storage structure
11 years ago
Michael Peter Christen
94b565ea0d
fixed keepalive min value
11 years ago
Michael Peter Christen
24a052ecb9
removed debug code for existsByIds
11 years ago
Michael Peter Christen
1a4a69c226
set more logger to 'final static'
11 years ago
orbiter
b085cb522b
replaced old existsByIds for embedded Solr with obviously much faster
...
new selection method (including stil existing debug code to test that
this is in fact better)
11 years ago
reger
066a1ecf0a
add highlight queryparams to solrservlet if missing
...
- modify query params in Solr parameter map (instead of querystring)
11 years ago
Michael Peter Christen
899e7e92b0
added debug code
11 years ago
reger
4684330505
Merge origin/master into jetty
...
Conflicts:
source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
11 years ago
reger
1437c45383
merge rc1/master
11 years ago
Michael Peter Christen
81bb50118e
found and fixed a huge memory leak in solr caching (inside Solr). The
...
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
11 years ago
reger
b85f702f22
add AccessTracker logging to SolrServlet
11 years ago
reger
de1f02420b
implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet.
...
- set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html"
- set a contenttype to GSAsearchServlet
11 years ago
Michael Peter Christen
b2c329929f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
60187a4ec2
fix in html parser
11 years ago
Michael Peter Christen
e1c1e57877
less overhead calling exist() with only one hash
11 years ago
reger
3d5d366f1c
fix html header in Solr HTMLResponseWriter
...
- move 1st body content after </head> tag
- add closing <span> tag
11 years ago
Michael Peter Christen
5a02d650ee
avoid cloning
11 years ago
reger
b38de92a16
Merge origin/master into jetty
11 years ago
Michael Peter Christen
cc39667399
Speed enhancements and less CPU usage during Solr searches when using
...
the embedded Solr (the default). This was obtained by cirumventing solrj
search encapsulation and the implementation of direct index access
methods to Solr.
The effect will not only be seen during search, but this has also a
strong effect on suggestions (much more) and less CPU power usage during
index distribution (which needs many search requests)
11 years ago
reger
f017066197
Merge origin/master into jetty
11 years ago
Michael Peter Christen
9bb7eab389
hacks to prevent storage of data longer than necessary during search and
...
some speed enhancements. This should reduce the memory usage during
heavy-load search a bit.
11 years ago
Michael Peter Christen
1a8783147b
enhanced computation of number of solr documents.
11 years ago
Michael Peter Christen
4948c39e48
added concurrency for mass crawl check
11 years ago
Michael Peter Christen
1b4fa2947d
- fixed a problem which ocurred when a document was not recognized with
...
the right content domain (i.e. identifying that it is an image, text
etc.) because it used the file extension and not an existing mime type
assignment.
- fixed the new setting that images shall be loaded for a better image
search.
- both fixes together makes it now possible to crawl
commons.wikimedia.org which makes use of 'funny' document names (i.e.
ending with .jpg while the document is html)
11 years ago
Michael Peter Christen
74d0256e93
enhanced postprocessing: fixed bugs, enable proper postprocessing also
...
without the harvestingkey, remove crawl profiles after postprocessing,
speed-up for clickdepth computation.
11 years ago
reger
a44eede8b8
merge rc1/master
11 years ago
sixcooler
d9a02ed277
NPE fix for my last commit
11 years ago
sixcooler
61f627eb85
fix for ssl-connections from proxy-usage staying in close-wait-state
...
+ some extra 'close' in HttpClient
11 years ago
Michael Peter Christen
1b61bd40ed
- Added new solr field url_file_name_tokens_t which stores the file name
...
tokens. This can be used to enhance the ranking.
- Added also a rating_i field as basis for later usage.
- enhanced the tokenization process.
11 years ago
sixcooler
d536092fe4
fix false fill NAME_CACHE_MISS-DNS-Cache in case of a timeout
...
for eg. caused by massive requests when crawl from file
11 years ago
Michael Peter Christen
ef31d0f279
fix for rss reader, see http://bugs.yacy.net/view.php?id=294
11 years ago
reger
c7c706fd9f
merge with rc1/master
12 years ago