orbiter
252c525709
fixed feed api servlet and and enhanced RSSReader class
11 years ago
orbiter
d38c3c14d8
fix for CGI test
11 years ago
Michael Peter Christen
31902f54df
fix for NPE which happens within solr code at MultiMapSolrParams.java,
...
line 52 in case that the array arr.length == 0
11 years ago
Michael Peter Christen
f13df9dbb6
migration to solr 4.4.0
11 years ago
Michael Peter Christen
58fe986cca
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
cf12835f20
replaced the single-text description solr field with a multi-value
...
description_txt text field
11 years ago
sixcooler
7d53ac86a3
fix for Blacklist (-Administration)
11 years ago
reger
f2d99053ed
Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception)
...
(occured during testing while working on q=store:[* TO *])
11 years ago
reger
92d3f71b16
htmlParser: closes input stream -> changed it to leave it open for a reset (used by AugmentParser - even if this is practically not used),
...
note: stream.close is done by caller (Textparser.parseSource)
- removed unnecessary reset in AugmentParser
- added stream.mark in tdfatripleimpl. to make stream.reset work here
11 years ago
orbiter
87cfeaa4f3
fix for npe
11 years ago
orbiter
268a36aaff
emergency fix for crawler: this will otherwise cause loss of complete
...
crawl queue if latency of remote system is too low
11 years ago
orbiter
d05e0c5368
wait a bit longer before doing the first peer ping
11 years ago
orbiter
b8f57f7703
don't be noisy when doing background tasks that may be allowed to fail
11 years ago
Roland Haeder
0343f0668c
Fix for NPE:
...
E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in
serverInstantThread.job, thread
'net.yacy.search.Switchboard.cleanupJob': null; target exception: null
java.lang.NullPointerException
at
net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116)
at
net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897)
at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107)
at
net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)
Conflicts:
source/net/yacy/search/schema/CollectionConfiguration.java
11 years ago
Roland Haeder
b58ca8622d
Some cleanups:
...
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
11 years ago
Roland Haeder
7263bb82fb
Fix for NPE on shutdown:
...
java.lang.NullPointerException
at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732)
at net.yacy.search.Switchboard.access00(Switchboard.java:207)
at net.yacy.search.Switchboard.run(Switchboard.java:3049)
11 years ago
Roland Haeder
13433d41a1
Log this exception better
...
Conflicts:
source/net/yacy/kelondro/blob/Tables.java
11 years ago
orbiter
080d80c9de
do not write an empty failreason in case that there is no fail. Because
...
of the lazy instantiation rule this value was not actually written, but
if lazy instantiation is switched on, then this causes that all crawl
starts delete all crawl-start-hosts completely because this looks for
filled error reasons.
11 years ago
Michael Peter Christen
4c242f9af9
always use a default value for boolean options to have transparency for
...
the outcome if the attribute is missing in servlets
11 years ago
Michael Peter Christen
61e015268b
fix in forced deletion: forced commit needed
11 years ago
Michael Peter Christen
83e2921b39
new test case for http://bugs.yacy.net/view.php?id=141
11 years ago
Michael Peter Christen
304aacb2cc
fix for http://bugs.yacy.net/view.php?id=267
11 years ago
Michael Peter Christen
c3b2301b2f
fix for http://bugs.yacy.net/view.php?id=268
11 years ago
reger
aa1a1f1d2c
- small adjustment to make sure genericParser is tried last
...
-- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser
(like .ps .rdf files and other)
- remove redundant file extension registration
11 years ago
orbiter
3e901dcb06
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
f50b596e0b
do not run dht ditribution if system load is over 2.5
11 years ago
orbiter
056b42f5aa
- added information about segment count to status_p.xml
...
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
11 years ago
orbiter
6fb2811e68
fixes for problems with remote solr and non-activated webgraph index
11 years ago
sixcooler
af740f3058
changed optimization to a segment-size of index-size/5.000.000
...
+ one if not idle
+ one (and force) if postprocessing
11 years ago
Michael Peter Christen
336f86394c
replaced StringBuffer with StringBuilder
11 years ago
Michael Peter Christen
aeac2fb763
replaced more containsKey() -> get() usages by a simple get(), followed
...
by a test for NULL. This should increase the application speed and
reduces the lookup time for the affected methods by 50%
11 years ago
orbiter
5364c4dcc9
delayed first peer-ping to send the first ping out after the http got
...
up; if the ping comes before the http is up, it cannot be recognized as
senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266
11 years ago
orbiter
e24016e30a
added the property federated.service.solr.indexing.timeout to yacy.init
...
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
11 years ago
orbiter
c124037f19
removed forced non-soft commits to prevent index fragmentation
11 years ago
Michael Peter Christen
31483c47e1
fixed problem with remote luke requests
11 years ago
Michael Peter Christen
c15aa758dc
removed failreason_t removal patch because that causes too much
...
confusion using an external solr. to clean up the index after a schema
change, use the index cleaner function from the online servlet
11 years ago
reger
2b7a38640a
extend content type detection on file extension for .tif .tiff .htm
11 years ago
Michael Peter Christen
ac1aad5064
added a getSegmentCount method and use it to disable optimize if wanted
...
current segment count is below optimization level
11 years ago
Michael Peter Christen
36035e0a0a
- used reger's LukeRequest to generalize the index info in
...
SolrServerConnector
- used the LukeRequest in SolrServerConnector to replace the index size
method by a getNumDocs request to a LukeRequest result
11 years ago
Michael Peter Christen
39fceb5ccf
fix for NPE & bug #264
11 years ago
Michael Peter Christen
735a66eff3
enhancements to crawler
11 years ago
Roland Haeder
be0ff6018f
Removed trailing spaces + some more final
11 years ago
Roland Haeder
aaedc0405d
Fixes and avoid of catching bad exceptions (some):
...
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call
Conflicts:
source/net/yacy/repository/LoaderDispatcher.java
11 years ago
Roland Haeder
841a28ae76
Added 'final' for all exception blocks as this helps the Java compiler
...
to optimize memory usage
Conflicts:
source/net/yacy/search/Switchboard.java
11 years ago
Felix Ableitner
03044589dd
Fixed (?i) appearing in entries, fixed multiple equal lines in file.
11 years ago
Michael Peter Christen
89c0aa0e74
added collection_sxt to error documents
11 years ago
Michael Peter Christen
0df5195cb0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
1fd006cc56
fixes using the embedded connector
11 years ago
orbiter
d0dc86cf3d
logging of deadlocks (if any) during cleanup process
11 years ago
Michael Peter Christen
c6a6f159e8
fix for crawl stack domain counter
12 years ago
Michael Peter Christen
93d1bac140
do a more frequent optimization, reduces IO after optimization
12 years ago
orbiter
b71d13a014
added load and deadlock detector in Memory util
12 years ago
orbiter
290e24564b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
5533fc8e01
fix for bug 260
12 years ago
Michael Peter Christen
b79471ee67
grr
12 years ago
Michael Peter Christen
a79f288ac1
automatically running optimize on solr if user/search is idle for some
...
time
12 years ago
orbiter
a9c8046c87
do a light optimization at the end of a crawl postprocessing
12 years ago
orbiter
a548354c71
replaced type of solr schema object sku of text_en_splitting_tight by
...
string
12 years ago
orbiter
2f1ec8d4a2
npe fix
12 years ago
Michael Peter Christen
bcc623a843
refactoring of load_delay: this is a matter of client identification
12 years ago
orbiter
0d0b3a30f5
activate api actions after postprocessing of crawls
12 years ago
orbiter
3978c5ca5d
fix for http://bugs.yacy.net/view.php?id=255
12 years ago
orbiter
2be456e7fb
added a postprocessing field into api/status_p.xml to show if the
...
postprocessing task is running at that time (status: busy) or not
(status:idle)
12 years ago
orbiter
dac88561ae
minimum access time has a tight connection to ClientIdentification,
...
therefore it is defined there.
12 years ago
Michael Peter Christen
9a29ab469e
another patch to prevent CLOSE_WAIT status on solr connections
12 years ago
Michael Peter Christen
5091d627bc
fixed parsing of peer flags
12 years ago
Michael Peter Christen
87e9052081
added Connection:close to all http requests in our http client to
...
prevent CLOSE_WAIT states (as seen in lsof)
12 years ago
Michael Peter Christen
5c6946dd5f
replaced usage of log4j by ConcurrentLog where possible
12 years ago
Michael Peter Christen
5878c1d599
- refactoring of log to ConcurrentLog:
...
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
12 years ago
orbiter
f4f6551c66
better handling of time-out at solrj in case that a commit is done in a
...
fail-over case during add
12 years ago
Michael Peter Christen
07261fe274
Merge remote-tracking branch 'nutomics/blacklist_structure'
12 years ago
Michael Peter Christen
dea71851d2
- better concurrency for network scanner
...
- network scanner can now start from the list of all hosts in the search
index
12 years ago
Michael Peter Christen
a34e137e27
fix for citation index generation in case that entry.referrerhash() is
...
null. This is especially the case if ftp sites are crawled
12 years ago
Michael Peter Christen
a2c8116a8f
accept (but ignore) a '+' sign in front of search words
12 years ago
orbiter
9f0cc9b401
enhanced network scanner
...
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
12 years ago
sixcooler
308d73f855
do not use remote proxy if not switched on - regardless of the proto
12 years ago
sixcooler
69906b1d2e
Revert "do not use remote proxy if not switched on - regardless of the proto"
...
This reverts commit 20f452d228
.
12 years ago
sixcooler
20f452d228
do not use remote proxy if not switched on - regardless of the proto
12 years ago
sixcooler
9551720d5c
re-enable saved setting for proxy-crawl-profile
12 years ago
sixcooler
d5d8936f9d
For indexes that are changing rapidly in NRT situations, fcs (stands for
...
Field Cache per Segment) may be a better choice than the default fc.
(saves memory)
see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
12 years ago
Felix Ableitner
44f8fcf62e
Changed class structure of Blacklist.
12 years ago
Michael Peter Christen
57ffdfad4c
added a crawl option to obey html-meta-robots-noindex. This is on by
...
default.
12 years ago
Michael Peter Christen
5a5d411ec0
new robots_i attribute fields
12 years ago
Michael Peter Christen
fa08bd9d5a
hack to prevent long waiting times in crawler
12 years ago
Michael Peter Christen
f1c5338210
prepartion for greedy crawl profiles and refactoring
12 years ago
Michael Peter Christen
e6f361f474
adding the canonical tag to crawl queues
12 years ago
reger
a6bf44212e
bugfix: location (lat/lon) meta data retrival (Double.NaN check)
12 years ago
Michael Peter Christen
203921006a
redesign of citation index storage
12 years ago
reger
83763ee4a4
jpeg parser: extract GPS location from meta data
12 years ago
Michael Peter Christen
32aa1d4569
removed unused option for queries
12 years ago
Michael Peter Christen
9d291764d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sixcooler
e5abccdfe4
added optimize-option
12 years ago
Michael Peter Christen
64140f35cd
fix for solr requests if no query part is given (prevent npe)
12 years ago
Michael Peter Christen
8caaf6203a
fixed false multiple-generation of remote facet search which
...
caused high cpu usage on remote side.
12 years ago
Michael Peter Christen
823ae4d6a7
added url_protocol_s to error documents
12 years ago
Michael Peter Christen
660a196989
refactoring
12 years ago
Michael Peter Christen
c4538d8d91
added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib
12 years ago
reger
3760e2616b
bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments
12 years ago
Michael Peter Christen
9a6fcdf597
npe fix
12 years ago
Michael Peter Christen
16d1d744fa
added url_file_name_s in default collection schema for the file name
...
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.
The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
12 years ago