Michael Peter Christen
4540174fe0
memory hacks
13 years ago
Michael Peter Christen
b4409cc803
small redesign of blob column index and usage
13 years ago
Michael Peter Christen
d5c1f2746e
performance hack
13 years ago
Michael Peter Christen
803963aebd
performance hack: better space grow in CharBuffer (speeds up html
...
parser)
13 years ago
Michael Peter Christen
8b0920b0b5
tried to fix the ipv6 problem as reported in bug
...
but this did not solve all problems because a bug in the apache http
client prevented that it worked. Thread dump:
Caused by: java.lang.NumberFormatException: For input string:
"1450:400c:c01:0:0:0:69"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.http.client.utils.URIUtils.extractHost(URIUtils.java:310)
at
org.apache.http.impl.client.AbstractHttpClient.determineTarget(AbstractHttpClient.java:764)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at net.yacy.cora.protocol.http.HTTPClient.execute(HTTPClient.java:597)
at
net.yacy.cora.protocol.http.HTTPClient.getContentBytes(HTTPClient.java:558)
at net.yacy.cora.protocol.http.HTTPClient.GETbytes(HTTPClient.java:341)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:131)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:74)
at
net.yacy.repository.LoaderDispatcher.loadInternal(LoaderDispatcher.java:274)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:164)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:150)
at
net.yacy.repository.LoaderDispatcher.loadDocument(LoaderDispatcher.java:355)
at getpageinfo_p.respond(getpageinfo_p.java:97)
13 years ago
Michael Peter Christen
e2f8f263e8
changed storage of search words: keep order
13 years ago
Michael Peter Christen
ed39ef2890
changed generation of protocol information
13 years ago
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
13 years ago
Michael Peter Christen
ffb72249ea
added missing apicat.sh
13 years ago
Michael Peter Christen
c166eb68b6
fixes in solr schema file
13 years ago
Michael Peter Christen
2e5cd6a1b2
fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen
8bee1472c9
there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen
5e18f54a8c
added shell script to get a servlet. this is the same as apicall.sh but it prints the result to stdout
13 years ago
Michael Peter Christen
3cd6dcd352
do not add new solr fields as activated fields
13 years ago
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
13 years ago
Michael Peter Christen
9727015213
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
7e728867e5
added a synchronization around iterations to prevent IO-deadlocking
...
during concurrent remote search requests
13 years ago
david
f077b11d38
Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git
13 years ago
Lotus
29675d9766
more label on search options (usability)
13 years ago
Michael Peter Christen
355ecf330f
reduced target file site to 64mb
13 years ago
Michael Peter Christen
b4bc1e2875
remote search does not do snippet generation
13 years ago
Lotus
335a776351
xss hardening on Status.html
13 years ago
Michael Peter Christen
10ae6d94a1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
2ea585d616
fix for host navigator
13 years ago
Michael Peter Christen
2f6dde92e2
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
c560a582ac
fix for single-word vocabulary lines
13 years ago
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
13 years ago
Michael Peter Christen
329e3eebcf
added example vocabularies and explanation how to use them
13 years ago
Michael Peter Christen
046d7de95b
Merge remote branch 'reger/master'
13 years ago
reger
a95f645a61
Bugfix class repository.Loaddispatcher fixed download file limit of 10000
...
line 355: final Response response = this.load(request, cachePolicy, 10000, true);
13 years ago
Michael Peter Christen
32adad7dd5
show less navigation by default
13 years ago
Michael Peter Christen
ef78f22ee1
performance hack
13 years ago
Michael Peter Christen
41536eb4a2
performance hack
13 years ago
Michael Peter Christen
88b86afc89
no DoS protection for intranet mode
13 years ago
Michael Peter Christen
0f443ac755
automatic switching off of navigation that is not useful
13 years ago
Michael Peter Christen
852ce43d99
better rules for default open/close of navigation objetcs
13 years ago
Michael Peter Christen
f91487fc50
added delete-button for host navigation
13 years ago
Michael Peter Christen
e8d24fd802
author navigator can be switched off
13 years ago
Michael Peter Christen
558ab7bd4e
made the protocol navigator reversible
13 years ago
Michael Peter Christen
96cb75f1d4
made the filetype navigator be able to deselect the search constraint
13 years ago
Michael Peter Christen
9ebcae2fbc
enhanced url parser to understand urls with & instead of & in post
...
urls
13 years ago
Michael Peter Christen
30891d026f
added a remove-navigation for vocabularies
13 years ago
Michael Peter Christen
1f4f60654a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/document/parser/pdfParser.java
13 years ago
Michael Peter Christen
d5ead5314d
changed navigation links: now using checkboxes.
...
This looks better and allows that negative checkboxes (such that remove
the navigation) are possible. These are not yet implemented (comming
next)
13 years ago
reger
32104360ce
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by
time out.
13 years ago
Michael Peter Christen
696ee5fc16
removed pdf from default parser deny list
13 years ago
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
13 years ago
Michael Peter Christen
33a71a61fa
Merge commit 'b60e2e952102c3eae40ab98c892a8c7d1b478345'
13 years ago
reger
b60e2e9521
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by time out.
13 years ago
Michael Peter Christen
a02fdf8625
better error messages
13 years ago