Commit Graph

91 Commits (14385057c28dd25fe74c0037630be43e763f10da)

Author SHA1 Message Date
Marc Nause 809b4e1fd9 Team added support for URLs with unicode characters in host part to
11 years ago
reger 58ecf5e4dd add to blacklist button in CrawlResults
11 years ago
Michael Peter Christen 030d0776ff Enhanced crawl start for very, very large crawl lists (i.e. > 5000)
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
orbiter f425b2c61c re-try to fetch url after a soft commit
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen b24d1d18e4 removed synchronization and concurrency in Fulltext class, concurrent
12 years ago
Michael Peter Christen 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
12 years ago
Michael Peter Christen b6de1f42dc Full redesign of solr connection architecture. This was done to support
12 years ago
Michael Peter Christen 0fe7b6fd3b migrated the index export methods from the old metadata to solr. Now
12 years ago
Michael Peter Christen 5fd3b93661 added deletion of hosts during crawl start if deleteold option was given
12 years ago
Michael Peter Christen d481abd087 added the visualization of error-urls to host browser
12 years ago
orbiter 354ef8000d - added 'deleteold' option to crawler which causes that documents are
12 years ago
Michael Peter Christen 43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 872f83ebe0 refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
orbiter 563d584420 removed more dependencies in cora from kelondro
12 years ago
Michael Peter Christen d8425e6809 added collections to crawl monitor
12 years ago
Michael Peter Christen 0cab06c47c refactoring
12 years ago
Michael Peter Christen 18f989dfb1 - refactoring (load -> getMetadata)
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 8b974905ee changed log-in text for all servlets with authentication:
13 years ago
Michael Peter Christen c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
orbiter e22f8497c9 - tested the ARC methods
13 years ago
orbiter a7df70221e refactoring
13 years ago
orbiter 9c131adeb6 show IP of crawled host and country in CrawlResults
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 5b579e21a3 code cleanup
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 6f4f957e50 *) cleaning up the code a little bit
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter 37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 06ff0c5b06 fixes for metadata retrieval and presentation
15 years ago
orbiter fc5efcc05a enhanced and fixed OAI-PMH import
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago