Commit Graph

111 Commits (ca10f0afcabf9374c8f188d930f45a6783389597)

Author SHA1 Message Date
luccioman c726154a59 Fixed removal of URLs from the delegatedURL remote crawl stack
6 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
Michael Peter Christen 9bf0d7ecb9 added a new collection type 'dht' to all documents from the peer-to-peer
10 years ago
reger 727dfb5875 refactore URIMetadataNode to further unify interaction with index
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
reger 227c42bc96 eleminate obsolete URIMetaDataRow class
11 years ago
Michael Peter Christen 2602be8d1e - removed ZURL data structure; removed also the ZURL data file
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 89c0aa0e74 added collection_sxt to error documents
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 0c1a018bbd removed 'later' tactic because it used too much RAM, reduced number of
12 years ago
Michael Peter Christen 2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen 0cab06c47c refactoring
12 years ago
Michael Peter Christen 18f989dfb1 - refactoring (load -> getMetadata)
12 years ago
orbiter e816b88b55 changed behaviour of metadata storage: in case that any solr is
12 years ago
Michael Peter Christen f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
12 years ago
orbiter d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
12 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Roland 'Quix0r' Haeder edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
orbiter a7df70221e refactoring
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 e7552bd719 *) cleaning up the code a little bit
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
lotus 5f661d9f5d fix receive of crawl receipt in private peer mode
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago