Michael Peter Christen
e3aa05b9dd
added creation of subpath pattern when crawl start is 'from file'
13 years ago
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
13 years ago
orbiter
28b30231c3
fix for url matcher of multiple amp& in an url, see:
...
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650
13 years ago
Roland 'Quix0r' Haeder
aef9dd0350
- removed cleaning of blacklist cache on startup
...
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
13 years ago
orbiter
c7afa8bc48
using SwitchboardConstants for solr attributes
13 years ago
sixcooler
a99ef68422
bump to httpclient-4.2.1
13 years ago
orbiter
c6d8950651
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter
5f3b8dc040
fix for RSS reader
13 years ago
orbiter
62202e2d71
refactoring of query attribute variable names for better consistency
...
with (next) stored query words
13 years ago
Michael Peter Christen
2160f9a819
Release 1.04
13 years ago
Michael Peter Christen
1addbc792c
use less memory for md5 cache
13 years ago
Michael Peter Christen
f32de94723
more logging
13 years ago
Michael Peter Christen
d09d9f2364
filter old peers from bootstrap (now stronger: 60 minutes instead of
...
240).
13 years ago
Michael Peter Christen
434ee90c59
added classification for control file types which shall not be loaded
...
but placed onto the noload-queue
13 years ago
Michael Peter Christen
1517a3b7b9
added webm mime-type
13 years ago
Michael Peter Christen
a90bcb48f6
added webm
13 years ago
Michael Peter Christen
801972fe6f
fix for url camel case parser and sentence reader
13 years ago
Michael Peter Christen
fbc1a2030d
fix for sitemap importer: can now also import very large sitemaps within
...
small memory configurations
13 years ago
Michael Peter Christen
92731e5287
fix for sevenzip parser
13 years ago
Michael Peter Christen
45641b0c23
catch and log a warning in RasterPlotter
13 years ago
Michael Peter Christen
8efc1c1078
- fixed a memory leak (or bad usage) during parsing/snippet fetch
...
- more logging for errors
13 years ago
Michael Peter Christen
c3db015410
prevent loading of content from the cache when retrieval with IFFRESH is
...
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.
13 years ago
Michael Peter Christen
91f14ea38e
fix to solr configuration (case where the external solr was not online)
13 years ago
sixcooler
2c5b68d932
more abstraction of error message
13 years ago
Michael Peter Christen
9758c521ab
abstraction of error message
13 years ago
Michael Peter Christen
ef0d09f103
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
b1e7c11fba
fix for pattern matcher in html parser
13 years ago
Michael Peter Christen
8a6edc0031
fix for solr shutdown
13 years ago
Michael Peter Christen
b8bcc06283
fix for urls beginning with "//"
13 years ago
sixcooler
9b6e4e46ca
fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430
13 years ago
Michael Peter Christen
b0c408788b
made class methods static where possible
13 years ago
Michael Peter Christen
5bd3c90907
- removed unnecessary semicolons
...
- added default case for switch
13 years ago
Michael Peter Christen
132afaf687
removed unaccessible code
13 years ago
Michael Peter Christen
7c1ba99755
removed more unused method parameters
13 years ago
Michael Peter Christen
83701a1b4c
removed unused ImageReference package
13 years ago
Michael Peter Christen
0301aba1e9
removed unused method parameters
13 years ago
Michael Peter Christen
241dd8410a
removed snippet pattern filter - it was not used
13 years ago
Michael Peter Christen
d3964253ae
- added @SuppressWarnings to unused servlet method parameters
...
- removed unnecessary casts
- removed unnecessary throw statements
13 years ago
Michael Peter Christen
ea10766bfd
cleaned unnecessary nested code
13 years ago
Michael Peter Christen
1481037820
replaced non-generic array with collection
13 years ago
Michael Peter Christen
4de50fe808
adding more principal peers for bootstraping
13 years ago
orbiter
fc0f9543fe
More SentenceReader cleanup
13 years ago
orbiter
586bb0eb6a
Simplified SentenceReader (no more Reader inside..)
13 years ago
orbiter
7f851d62a7
replaced HashARC with SizeLimited Objects which are less costly
13 years ago
orbiter
d4291ac1f3
more tolerance when creating solar document
13 years ago
orbiter
78fc3cf8f8
refactoring and new usage of SentenceReader: this class appeared as one
...
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
13 years ago
orbiter
bb8dcb4911
automatically adopt size of word cache to available memory
13 years ago
Michael Peter Christen
ad09b786bf
clean up parser data
13 years ago
Michael Peter Christen
276a66a793
Adding a limit of 1000 links that a parser shall store during indexing.
...
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.
13 years ago
Michael Peter Christen
613b45f604
- better data structures in secondary search
...
- fixed a big memory leak in secondary search
13 years ago