Michael Peter Christen
bf1b6b93e7
do not write CR values to webgraph if no CR values are computed
11 years ago
Michael Peter Christen
e039e78210
small bugfixes
11 years ago
Michael Peter Christen
32a2ff925c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
d07cdd8c3b
added SolrCloud access mode and configuration
11 years ago
Michael Peter Christen
8514bffc22
enhanced postprocessing status report
11 years ago
reger
b24572f304
fix GSA filter query assignment
...
- use more parameter constants
11 years ago
Michael Peter Christen
b5fc2b63ea
removed exist() retrieval functions from error cache and replaced it
...
with metadata retrieval from connectors directly. This should cause
better usage of the cache. Automatically increase the metadata cache if
more memory is available.
11 years ago
Michael Peter Christen
62c72360ee
cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
...
double-calling of solr
11 years ago
Michael Peter Christen
dd5cdfe212
reverted filter query hack, it did not work
11 years ago
Michael Peter Christen
b5d78ba156
reduced number of solr queries during crawling
11 years ago
Michael Peter Christen
5326970d6c
enhanced solr queries for single document extraction
11 years ago
Michael Peter Christen
525575bd97
added debugging of filter queries in thread dump thread names
11 years ago
Michael Peter Christen
f319ef268f
testing filter queries instead of queries to retrieve documents by id
11 years ago
Michael Peter Christen
fd87fa1613
removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen
f2b476e08b
don't do a double check to solr for failed documents if they are not
...
written to solr
11 years ago
Michael Peter Christen
06ab72d1af
enhanced crawler host round-robin strategy
11 years ago
orbiter
dab9a0786a
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
51bf5c85b0
Renamed the transmission cloud to buffer in dispatcher since the name
...
'cloud' was a bad idea. Changed also the accumulation process for peer
targets so that every dht chunk is not assigned the set of redundant
targets but they are assigned to redundant targets individually. This
enhances the granularity of the target accumulation and should enhance
the efficiency of the process. Finally the dht protocol client was
enriched with the ability to remove the 'accept remote index' flag from
peers or remove peers completely if they do not answer at all.
11 years ago
Michael Peter Christen
a694b6a8fc
another fix for unique field computation
11 years ago
Michael Peter Christen
fb3dd56b02
fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen
b0d941626f
fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger
d9472d043a
cleanup older unused classes
11 years ago
reger
665e12f88e
move startup time from old serverCore to switchboard (most used here)
...
to make servercore eventually obsolete.
11 years ago
reger
336425912a
remove unused localSearchThread from SearchEvent
11 years ago
reger
32bd2a61c1
add local ip to AbstractRemoteHandler local hostname cache
11 years ago
Michael Peter Christen
f3a6b6e21e
fix for bad URL decoding
11 years ago
Michael Peter Christen
1092e798a5
fixed double content postprocessing
11 years ago
Michael Peter Christen
aee5b108e5
added linkScraperParser, a parser which ignores the text like the
...
generic parser but extracts links like the htmlParser. This should be
used for ASCII documents without known text format annotation like
source code files or json documents. Probably also good for xml files
without known schema.
11 years ago
reger
2b8cc5832c
fix seek error for 0 file size records file
...
by add extra check for file size = 0 in cleanlast()
- (http://mantis.tokeek.de/view.php?id=411 )
11 years ago
reger
2ba394333f
fix Crawler HostQueue release of stackfile
...
- close stackfile inputstream at end of ChunkIterator
This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)
11 years ago
reger
40133ba2d0
fix NPE in Condenser,
...
discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"
11 years ago
orbiter
59160984cc
timeline performance update
11 years ago
orbiter
54bea96e67
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen
841cc77391
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
e09218129c
remove check for local solr. This check was made during a time when Solr
...
was optional and another alternative metadata store was available. Since
that store is now removed, Solr is always available (internally or
externally)
11 years ago
orbiter
2073e69034
fix for long periods in timeline
11 years ago
reger
1f94df29e7
fix NPE in solr rss where snippet contains only the title text
...
and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping
(still open item description may be double as dc: tag and rss.description tag)
11 years ago
Michael Peter Christen
09dcdb9b19
update to solr 4.9.0
11 years ago
Michael Peter Christen
1cd4b2e8be
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
8c52f0651b
refactoring of AccessTracker events & timeline fix
11 years ago
reger
431a5f9c4e
added test case for TextSnippet,
...
removed obsolete/unused parameter and reference to MediaSnippet
11 years ago
Michael Peter Christen
5b94a257ce
no timeout for large reference collections
11 years ago
Michael Peter Christen
f5b817bac4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
cb2c17d236
extract author and keywords in .doc and .ppt parser
11 years ago
reger
a5707cd2eb
enable proper Author navigator
...
- author facet is based on omitted author_sxt field
- adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?)
- add check for querymodifier author in searchevent
11 years ago
Michael Peter Christen
74206a10c7
refactoring
11 years ago
orbiter
fec673c9d1
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter
4a66af716d
added apkParser stub (work in progress)
11 years ago
orbiter
c59da9fe7a
added access tracker log reader stub
11 years ago
reger
2d67f29244
adjust mergeDocument after parsing to
...
- preserve charset and languages
- fix merge of author
11 years ago