Michael Peter Christen
9a29ab469e
another patch to prevent CLOSE_WAIT status on solr connections
12 years ago
Michael Peter Christen
5091d627bc
fixed parsing of peer flags
12 years ago
Michael Peter Christen
87e9052081
added Connection:close to all http requests in our http client to
...
prevent CLOSE_WAIT states (as seen in lsof)
12 years ago
orbiter
2a19a60074
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sixcooler
bff8c753c6
re-insert this file - was deleted by mistake
...
+ correct an other case-typo
12 years ago
orbiter
e609ec388a
metager whitelist update
12 years ago
Michael Peter Christen
5c6946dd5f
replaced usage of log4j by ConcurrentLog where possible
12 years ago
Michael Peter Christen
5878c1d599
- refactoring of log to ConcurrentLog:
...
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
12 years ago
Michael Peter Christen
6d5533c9cd
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
c79f687110
enhanced the network scanner: find more hosts automatically by removal
...
of common subdomains before application of protocol-specific prefix
12 years ago
orbiter
f4f6551c66
better handling of time-out at solrj in case that a commit is done in a
...
fail-over case during add
12 years ago
orbiter
b4677d1cad
fix for bug #252
...
the naming of the servlet was wrong, the bug may not be present on
systems where upper/lowercase matching is lazy (windows)
12 years ago
Michael Peter Christen
2716dfc46c
increase crawler speed by reduction if the busysleep time
12 years ago
Michael Peter Christen
07261fe274
Merge remote-tracking branch 'nutomics/blacklist_structure'
12 years ago
Michael Peter Christen
dea71851d2
- better concurrency for network scanner
...
- network scanner can now start from the list of all hosts in the search
index
12 years ago
Michael Peter Christen
a34e137e27
fix for citation index generation in case that entry.referrerhash() is
...
null. This is especially the case if ftp sites are crawled
12 years ago
Michael Peter Christen
a2c8116a8f
accept (but ignore) a '+' sign in front of search words
12 years ago
orbiter
9f0cc9b401
enhanced network scanner
...
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
12 years ago
orbiter
d8354a389c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Lotus
6e120e90fe
do not cut text on submit buttons
12 years ago
orbiter
f8c28efd66
fix for rssTerminal coloring
12 years ago
sixcooler
308d73f855
do not use remote proxy if not switched on - regardless of the proto
12 years ago
sixcooler
69906b1d2e
Revert "do not use remote proxy if not switched on - regardless of the proto"
...
This reverts commit 20f452d228
.
12 years ago
sixcooler
20f452d228
do not use remote proxy if not switched on - regardless of the proto
12 years ago
sixcooler
9551720d5c
re-enable saved setting for proxy-crawl-profile
12 years ago
sixcooler
d5d8936f9d
For indexes that are changing rapidly in NRT situations, fcs (stands for
...
Field Cache per Segment) may be a better choice than the default fc.
(saves memory)
see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
12 years ago
Felix Ableitner
44f8fcf62e
Changed class structure of Blacklist.
12 years ago
Michael Peter Christen
3054a6d4b9
added a patch from Sebastian M.B., submitted by email for coloring of
...
rss terminal
12 years ago
Michael Peter Christen
78af998f8f
Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'
12 years ago
Michael Peter Christen
57ffdfad4c
added a crawl option to obey html-meta-robots-noindex. This is on by
...
default.
12 years ago
Felix Ableitner
fd90fcc4e0
Fixes #196 .
12 years ago
Michael Peter Christen
5a5d411ec0
new robots_i attribute fields
12 years ago
Michael Peter Christen
fa08bd9d5a
hack to prevent long waiting times in crawler
12 years ago
Michael Peter Christen
f1c5338210
prepartion for greedy crawl profiles and refactoring
12 years ago
Michael Peter Christen
e6f361f474
adding the canonical tag to crawl queues
12 years ago
orbiter
40c5ee47c1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
ae23a0badb
updated copyright message; included LGPL for 'cora' and a warranty
...
warning.
12 years ago
reger
a6bf44212e
bugfix: location (lat/lon) meta data retrival (Double.NaN check)
12 years ago
Michael Peter Christen
203921006a
redesign of citation index storage
12 years ago
orbiter
7c6ccc426c
set crawlingQ to true by default because most webpages are dynamic and
...
crawlingQ should only be switched off in case of crawler traps
12 years ago
Lotus
5de4267a9d
windows installer: update to latest jre
12 years ago
reger
83763ee4a4
jpeg parser: extract GPS location from meta data
12 years ago
Michael Peter Christen
e92b9275ce
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
56cdcfa2fa
fixed greedy learning mode - global is not a search attribute in
...
searchitems
12 years ago
Michael Peter Christen
32aa1d4569
removed unused option for queries
12 years ago
Michael Peter Christen
0c5bed7e2c
added configuration option for greedy learning function to ConfigPortal
...
servlet
12 years ago
sixcooler
5d1f619f07
possible helpful closing of solr-requests
12 years ago
Michael Peter Christen
9d291764d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sixcooler
e5abccdfe4
added optimize-option
12 years ago
Michael Peter Christen
8ea6ddf636
removed attributes from ConfigPortal.html which are redundant to
...
ConfigSearchPage_p.html
12 years ago