Michael Peter Christen
84167adb49
removed unused anomichttpd code after migration to jetty
11 years ago
Michael Peter Christen
b461a27abb
fixed the SolrServlet
11 years ago
Michael Peter Christen
7603e879dc
Merge branch 'master' into HEAD
...
Conflicts:
.classpath
source/net/yacy/cora/federate/solr/SolrServlet.java
11 years ago
Michael Peter Christen
25250405f1
solr servlet preparation for join with jetty branch
11 years ago
Michael Peter Christen
2f16770681
migrated to solr 4.6.0
11 years ago
Michael Peter Christen
57f0f71ac6
added patch to allow binary response writer
11 years ago
orbiter
937273d4e3
added parsing of metadata to surrogate reading:
...
a dublin core record inside of surrogate input files may now contain
tokens within the namespace 'md' (short for: metadata). The token names
must be valid withing the namespace of the solr field names. All
md-tokens inside of surrogate files then overwrite values within solr
documents before they are written to the solr index. This makes it
possible to assign collection names to each surrogate entry and also
ranking information can be added. Please see the example file.
11 years ago
reger
18497f6475
remove unused init parameter from DefaultServlet
...
- remove "RelativeResourceBase" parameter
11 years ago
orbiter
4de3fefdb5
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
7e346e1d79
using stringbuilder in query construction
11 years ago
reger
c84c313fe1
Merge origin/master into jetty
11 years ago
Michael Peter Christen
2702d9e56b
- added a SolrQueryResponse2SolrDocumentList method which is able to
...
work around the unfolding process in Solr's BinaryResponseWriter.
This was a huge performance bottleneck in the embedded solr connector
and the problem is actually on Solr side, but we have now a workaround.
- This made it possible to abstract a high-performance index access
method which is implemented as method getDocumentListByParams. That
method is also implemented in the SolrServerConnector and provides a
very efficient access to a solr index if the index is embedded.
- a popular use of the document list retrieval is a result count which
can now also make use of the new method, via getDocumentCountByParams.
- enhanced the Error cache which now does not store error documents
within the ram cache if the document is also written to solr. When
documents are retrieved from the cache, they are partly read from the
ram cache and if not existent there, from the Solr index.
11 years ago
Michael Peter Christen
74466d731a
use pre-compiled patterns in ymark
11 years ago
Michael Peter Christen
34633044b4
made pattern computation static
11 years ago
Michael Peter Christen
ef7ddbc933
added date parser caches to prevent re-calculation of costly date
...
parsing
11 years ago
Michael Peter Christen
552ef9f18e
fix for bad ErrorCache.exists test (bug from latest commit)
11 years ago
Michael Peter Christen
09412ea3a4
counting search requests in solr interface
11 years ago
Michael Peter Christen
303f5694ba
avoid usage of existsByQuery. If a document can be loaded by the ID
...
before testing other fields from the existsByQuery request, then a
document cache fills and queries after that one can be avoided.
11 years ago
reger
b43bbd3cc4
join DefaultServlet and Jetty8 implementation
...
- removing Jetty 8 specific dependencies
11 years ago
reger
089c5007ee
move conditionalHeader to DefaultServlet
...
- by removing Jetty specific implementation detail
11 years ago
Michael Peter Christen
79771c60c0
IPv6 fixes
11 years ago
reger
92d9c56f9f
Merge origin/master into jetty
11 years ago
Michael Peter Christen
78eac85161
better calibration of caches and queue maximum sizes
11 years ago
Michael Peter Christen
c8af19bd37
removed unnecessary check which causes a NPE when searching with empty
...
search string
11 years ago
Michael Peter Christen
e3c2f09de9
- reduce computation in case that specific postprocessing fields are not
...
selected
- de-select citation rank computation
11 years ago
Michael Peter Christen
cfa08024c7
removed optimization bevore postprocessing because that may cause a
...
time-out which will cause that postprocessing fails.
11 years ago
Michael Peter Christen
6f3a923691
fixed urlmask which was not able to combine several constraints
11 years ago
Michael Peter Christen
9a27bf6e82
removed filter computation in Protocol class for remote searches because
...
that is already done in the QueryParams class
11 years ago
Michael Peter Christen
f1b5db2c45
- performance graph does not shop peer ping in memory monitor any more
...
- after a forced GC, the PerformanceMemory view switches to automatic
update by default
11 years ago
Michael Peter Christen
a125904a1c
fixed a NPE in surrogat processing
11 years ago
Michael Peter Christen
0db8e34625
enhanced webgraph processing
11 years ago
reger
ac067b5236
clean-up Jetty handler classes
11 years ago
reger
b75e92aac3
add read queryparameter in gsaservlet
11 years ago
reger
1e94719084
fix NPE on mime detection of unknown file extension
11 years ago
reger
effea4bca0
Merge origin/master into jetty
...
Conflicts:
source/net/yacy/cora/federate/solr/SolrServlet.java
11 years ago
sixcooler
2c2ebb0d92
tried some hardening in order not letting any Solr-Searchers open
11 years ago
Michael Peter Christen
a16534cb0a
tried to fix timeout and connection-lost problems when using an outside
...
solr.
11 years ago
Michael Peter Christen
c3dcbdc8d5
try to recover from an OOM during citation index reading and fail-over
...
to second solr core in case of unrecoverable OOM.
11 years ago
Michael Peter Christen
9932c441c8
fixed a problem with Date fields parsing Solr results if a remote Solr
...
is attached.
11 years ago
sixcooler
94db054aff
memory-leak-fix: the DocListSearcher fires an query in its constructor
...
and it is highly recommend to close every SolrRequest.
Every Request, which is not closed leaves a Searcher with its Chaches an
can not be garbage-collectet.
11 years ago
reger
26bb1e37b7
implement core selection in SolrServlet
...
- making initcore() obsolete
11 years ago
Michael Peter Christen
ae55d69ef6
include/exclude size NPE fix (recently added)
11 years ago
Michael Peter Christen
2c39b65409
fixes for searches containing stopwords. The fix was done using a
...
reconstruction of the search word set access method to protect that
words are deleted from the sets from the outside of the QueryGoal class.
11 years ago
Michael Peter Christen
5592ea57f0
hack to remove compiler warnings about deprecated classes. It would be
...
better to remove the deprecated usage but to do this the Solr core must
adopt the latest apache http core changes as well .. this is not our
fault.
11 years ago
orbiter
037cd0a57c
using the BinaryResponseWriter which is supported within the YaCy solr
...
servlet since YaCy 1.63. This is much more performant for the client
than using the XMLResponseWriter because parsing of XML data is very CPU
intensive. Older YaCy peers are still requested using the
XMLResponseWriter but the majority of YaCy peers already respond with
the binary writer. This makes remote searches much faster and less CPU
intensive.
11 years ago
orbiter
61409788eb
less word hash computations (removing some overhead because of MD5
...
calcs) using the clear word in a normalized form.
11 years ago
reger
f23471c471
add check to prevent index entries containing url_file_ext_s with ";jsession=xyz"
...
note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication)
11 years ago
reger
5c4a3d1c01
Merge origin/master into jetty
11 years ago
reger
444a9ae674
remove unused options and attributes from DefaultServlet
...
cleanup obsolete class files
11 years ago
reger
8da75a4b0c
fix contentType definition for Solr html responswriter
...
from xml to html
(hint: value is currently not used, but is in SolrServlet)
11 years ago
Michael Peter Christen
ccf2f4e43b
refactoring of seed attributes (introduced more constants)
11 years ago
Michael Peter Christen
1f0bfa8fec
added test to Base64Order (runs successfully!)
11 years ago
orbiter
b7f1e5af51
added new servlet which generates the same file as the principal peers
...
upload to a bootstrap position
you can call it either with
http://localhost:8090/yacy/seedlist.html
or to generate json (or jsonp) with
http://localhost:8090/yacy/seedlist.json
http://localhost:8090/yacy/seedlist.json?callback=seedlist
11 years ago
orbiter
3e552550d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
c2d720cdaf
purge a lucene cache - possible memory leak fix
11 years ago
reger
e4f49fb175
for searchresults with empty title use filename as title
...
- to not store a title in index which isn't extracted from source
the title is empty check only added to ResultEntry class
11 years ago
reger
b1dc9a6f52
- disable Jetty servlet defaultUseCache (prevent double caching)
...
- include short memory status check for class cache in DefaultServlet
- remove obsolete Resource interface for Jetty8YaCyDefaultServlet
11 years ago
reger
f111f30ace
Merge origin/master into jetty
11 years ago
reger
94293176a3
use writeOptionHeaders with ServletResponse parameter only
11 years ago
orbiter
ff86cb683f
fixed some XSS bugs reported by Marius from http://ctf365.com/
11 years ago
orbiter
da33ee0d77
extended also timeout fr webgraph postprocessing
11 years ago
orbiter
74f9e40747
extended timeout during postprocessing of 30 minutes.
11 years ago
orbiter
19a051bec8
more monitoring for postprocessing and enhanced layout in Crawler
...
monitor page
11 years ago
Michael Peter Christen
9cf9727685
fix for wrong counter
11 years ago
Michael Peter Christen
fceac8cffd
more monitoring for postprocessing
11 years ago
Michael Peter Christen
6842783761
fixed and enhanced postprocessing
11 years ago
Michael Peter Christen
219d5934a4
fixed termination bug in Solr Connector
11 years ago
Michael Peter Christen
bf1bdd52a6
prevent requesting of 0-facets (which actually exist)
11 years ago
Michael Peter Christen
9d5895f643
enhanced and fixed postprocessing
11 years ago
Michael Peter Christen
f86fe90eda
enhanced mass storage speed to remote solr servers
11 years ago
Michael Peter Christen
6ed9821209
fixed several problems in solr connectors
11 years ago
Michael Peter Christen
191fd3d7e7
added an optimization option to HandleSet mass data storage structure
11 years ago
Michael Peter Christen
94b565ea0d
fixed keepalive min value
11 years ago
reger
b26787dc2d
- DefaultServlet: remove static gzip option
...
YaCy doesn't use pre-gzip'ed static html pages
- ProxyServlet: remove not neede procedure
- Server init: skip one overlaping servlet context
11 years ago
Michael Peter Christen
24a052ecb9
removed debug code for existsByIds
11 years ago
Michael Peter Christen
087df05e24
added option to Config_Network_p.html to enable remote search while
...
DHT-Receive is switched off.
11 years ago
Michael Peter Christen
1a4a69c226
set more logger to 'final static'
11 years ago
Michael Peter Christen
c60947360d
logger should be static
11 years ago
Michael Peter Christen
69b8d61c47
fix for search requests in GSA interface which contain 'funny'
...
characters (like ':' etc.)
11 years ago
orbiter
b085cb522b
replaced old existsByIds for embedded Solr with obviously much faster
...
new selection method (including stil existing debug code to test that
this is in fact better)
11 years ago
reger
b29d262e70
implement Jetty8HttpServerImpl.generateSocketAddress
...
(code 1:1 copied from serverCore)
11 years ago
orbiter
4234b0ed6c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
909bbb49d8
added (partly commented) test code for url rewrite methods .. to be
...
completed
11 years ago
reger
066a1ecf0a
add highlight queryparams to solrservlet if missing
...
- modify query params in Solr parameter map (instead of querystring)
11 years ago
Michael Peter Christen
899e7e92b0
added debug code
11 years ago
reger
4684330505
Merge origin/master into jetty
...
Conflicts:
source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
11 years ago
reger
1437c45383
merge rc1/master
11 years ago
Michael Peter Christen
87a956e881
calculating and showing the number of files and the average size of a
...
file in the HTCACHE in ConfigHTCache_p.html
11 years ago
Michael Peter Christen
acc1f8a749
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
81d9e23532
fixed another memory leak in the PDF parser:
...
the class org.apache.pdfbox.pdmodel.font.PDFont occupies 8MB of space
which cannot be cleaned if PDFont.clearResources is called.
The attempt to clean the class cache therefore causes that the class is
loaded and this cache is initialized with some rubbish. I tried to
prevent to instantiate this class by usage of a hacked findLoadedClass
call to the SystemClassLoader (which is protected ...).
Now, without using the PDF parser at all, 8MB of RAM space is not
occupied, however, when the first PDF arrives this space will be taked
and never given back to GC.
WAKE UP YOU LAZY PDFBOX HACKER AND FIX THIS SHIT!
11 years ago
Michael Peter Christen
c152d996e6
reduced footprint of BookmarksDB which can take quite a lot of memory if
...
the number of bookmarks is high (i.e. > 2000 URLs)
11 years ago
Michael Peter Christen
81bb50118e
found and fixed a huge memory leak in solr caching (inside Solr). The
...
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
11 years ago
reger
7b17cdf6dd
add content_type:image/* to image search
...
- see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result
- try it yourself with following sample query
/solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type
adresses also possible url without or deviating extension.
11 years ago
reger
082c9a98c1
move writeHeaders from Jetty8 servlet to YaCyDefaultServlet
...
- after removing Jetty server dependency (of Response using HttpServletResponse only)
11 years ago
sixcooler
987f410011
URL-export:add query and fix for cast-class-exception
11 years ago
Michael Peter Christen
a8253ca49c
added missing unicode transformation in href link contents during
...
parsing
11 years ago
Michael Peter Christen
0cf9e9580b
added clickdepth and CR computation debug code to verify that the
...
process is complete
11 years ago
reger
b85f702f22
add AccessTracker logging to SolrServlet
11 years ago
reger
de1f02420b
implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet.
...
- set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html"
- set a contenttype to GSAsearchServlet
11 years ago
Michael Peter Christen
234a974955
load image only if their parser flag is activated
11 years ago