Michael Peter Christen
b5d78ba156
reduced number of solr queries during crawling
10 years ago
Michael Peter Christen
5326970d6c
enhanced solr queries for single document extraction
10 years ago
Michael Peter Christen
525575bd97
added debugging of filter queries in thread dump thread names
10 years ago
Michael Peter Christen
f319ef268f
testing filter queries instead of queries to retrieve documents by id
10 years ago
Michael Peter Christen
fd87fa1613
removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen
f2b476e08b
don't do a double check to solr for failed documents if they are not
...
written to solr
11 years ago
Michael Peter Christen
06ab72d1af
enhanced crawler host round-robin strategy
11 years ago
orbiter
dab9a0786a
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
51bf5c85b0
Renamed the transmission cloud to buffer in dispatcher since the name
...
'cloud' was a bad idea. Changed also the accumulation process for peer
targets so that every dht chunk is not assigned the set of redundant
targets but they are assigned to redundant targets individually. This
enhances the granularity of the target accumulation and should enhance
the efficiency of the process. Finally the dht protocol client was
enriched with the ability to remove the 'accept remote index' flag from
peers or remove peers completely if they do not answer at all.
11 years ago
reger
7057e0b3e2
catch input file not found in Mediawiki import
11 years ago
Michael Peter Christen
a694b6a8fc
another fix for unique field computation
11 years ago
Michael Peter Christen
fb3dd56b02
fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen
b0d941626f
fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger
d9472d043a
cleanup older unused classes
11 years ago
reger
665e12f88e
move startup time from old serverCore to switchboard (most used here)
...
to make servercore eventually obsolete.
11 years ago
reger
336425912a
remove unused localSearchThread from SearchEvent
11 years ago
reger
32bd2a61c1
add local ip to AbstractRemoteHandler local hostname cache
11 years ago
Michael Peter Christen
f3a6b6e21e
fix for bad URL decoding
11 years ago
Michael Peter Christen
1092e798a5
fixed double content postprocessing
11 years ago
Michael Peter Christen
aee5b108e5
added linkScraperParser, a parser which ignores the text like the
...
generic parser but extracts links like the htmlParser. This should be
used for ASCII documents without known text format annotation like
source code files or json documents. Probably also good for xml files
without known schema.
11 years ago
Michael Peter Christen
f384fd624b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
2b8cc5832c
fix seek error for 0 file size records file
...
by add extra check for file size = 0 in cleanlast()
- (http://mantis.tokeek.de/view.php?id=411 )
11 years ago
reger
1f2eba977d
add test case for Records (used in HostBalancer)
...
- simulating seek error (http://mantis.tokeek.de/view.php?id=411 )
11 years ago
reger
2ba394333f
fix Crawler HostQueue release of stackfile
...
- close stackfile inputstream at end of ChunkIterator
This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)
11 years ago
reger
40133ba2d0
fix NPE in Condenser,
...
discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"
11 years ago
reger
e94efd4d7c
update to JUnit 4.11
...
- fix build.xml -> parserTest error on Windows due to javac encoding
11 years ago
reger
3b77e41f1a
adding test for HostQueue crawl stack
...
- simulating problem with zero length stack file (but not fixing it)
- adding test data clean to maven pom
11 years ago
reger
ba5a59a28d
make search result also avail. as atom feed via /yacysearch.atom
...
- fix logo in rss feed
11 years ago
orbiter
59160984cc
timeline performance update
11 years ago
orbiter
54bea96e67
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen
15b2fad6a2
reverted latest change for reindexing because that works actually only
...
for internal Solr indexes. This is mainly caused by the fact that an
external Solr may be also a SolrCloud which do not support LukeRequests,
which are needed to request the old Schema.
11 years ago
Michael Peter Christen
841cc77391
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
e09218129c
remove check for local solr. This check was made during a time when Solr
...
was optional and another alternative metadata store was available. Since
that store is now removed, Solr is always available (internally or
externally)
11 years ago
orbiter
2073e69034
fix for long periods in timeline
11 years ago
reger
1f94df29e7
fix NPE in solr rss where snippet contains only the title text
...
and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping
(still open item description may be double as dc: tag and rss.description tag)
11 years ago
Michael Peter Christen
09dcdb9b19
update to solr 4.9.0
11 years ago
Michael Peter Christen
282b53db42
update of commons-io and slf4j-api (as preparation for Solr 4.9.0)
11 years ago
Michael Peter Christen
1cd4b2e8be
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
8c52f0651b
refactoring of AccessTracker events & timeline fix
11 years ago
reger
431a5f9c4e
added test case for TextSnippet,
...
removed obsolete/unused parameter and reference to MediaSnippet
11 years ago
Michael Peter Christen
5b94a257ce
no timeout for large reference collections
11 years ago
Michael Peter Christen
f5b817bac4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
cb2c17d236
extract author and keywords in .doc and .ppt parser
11 years ago
reger
a5707cd2eb
enable proper Author navigator
...
- author facet is based on omitted author_sxt field
- adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?)
- add check for querymodifier author in searchevent
11 years ago
Michael Peter Christen
1b279d7a7e
fixed external link
11 years ago
Michael Peter Christen
74206a10c7
refactoring
11 years ago
orbiter
fec673c9d1
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
4a66af716d
added apkParser stub (work in progress)
11 years ago
orbiter
c59da9fe7a
added access tracker log reader stub
11 years ago
reger
2d67f29244
adjust mergeDocument after parsing to
...
- preserve charset and languages
- fix merge of author
11 years ago