Commit Graph

42 Commits (596b5dfa5936b25b605c42807730c29a1d08cd15)

Author SHA1 Message Date
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
10 years ago
Michael Peter Christen 48fbfa60c1 bugfix to inbound/outbound identification
11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
Michael Peter Christen bcc623a843 refactoring of load_delay: this is a matter of client identification
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
reger f301336adf fix: no results with configuration citation reference index switched off
12 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 8219a445f3 refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen e377092198 fix to xml output format
13 years ago
Michael Christen 41be98dc9d extended webstructure api to show together with incoming links also
13 years ago
Michael Christen 71649a1296 added an api to retrieve the new citation.index with the
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter 39f409a7bb performance hacks
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter a29a11e526 added evaluation of incoming links in webstructure api
16 years ago
orbiter 7ba078daa1 - added fast site-operator
16 years ago
orbiter bd409fb7ba added web structure analysis for a special domain that can be requested from the api.
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago
orbiter b423d0a036 moved all servlets from htroot/xml to htroot/api
16 years ago