Commit Graph

221 Commits (a0b84e4defd673677c9ca15bab893945cf15af0b)

Author SHA1 Message Date
Michael Peter Christen 6983dff334 explain crawl denial when not switched to intranet mode 11 years ago
Michael Peter Christen d8beafba3a fix for values in CrawlProfileEditor table and xml; now the full profile 11 years ago
Michael Peter Christen ec95dfa2e6 fixed crawl profile xml result which did not show the correct crawl 11 years ago
Michael Peter Christen 9b1958e8ca more ipv6 bugfixes 11 years ago
Michael Peter Christen e1bc768f9d more IPv6 bugfixes 11 years ago
reger fb1fcc2b03 handle noarchive tag, skip writing page to cache 11 years ago
Michael Peter Christen 6491270b3a large IPv6 redesign of peer ping methods! 11 years ago
Michael Peter Christen 67cd4c37bd activated the new apk parser which was already ready but not included in 11 years ago
Michael Peter Christen 025516f682 fix for crawl limit for number of pages fail 11 years ago
orbiter 3ac31614a3 added option to reverse-sort YaCy tables (internal API change only) 11 years ago
Michael Peter Christen bf18a39d0e replaced warning with info 11 years ago
Michael Peter Christen ebd0be2cea fixes and speed updates for search process 11 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references 11 years ago
orbiter 4ae7aead28 addon to latest fix 11 years ago
Michael Peter Christen eca9380e3d bugfix for crawler double-check: if an url is redirected, the 11 years ago
Michael Peter Christen 9ac0c93f17 fix for subpath crawl filter 11 years ago
Michael Peter Christen 66106bdaf0 fix for crawler attribute maxdompages 11 years ago
Michael Peter Christen 49d91b94c3 npe fix in crawler 11 years ago
Michael Peter Christen c465b791af typo 11 years ago
Michael Peter Christen 3c23b89823 less logging 11 years ago
Michael Peter Christen 1609763be5 toString fix 11 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail 11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser. 11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks 11 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer 11 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a 11 years ago
orbiter 4b06adb751 fix for file urls 11 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be 11 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid 11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow" 11 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it 11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid 11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling 11 years ago
Michael Peter Christen 06ab72d1af enhanced crawler host round-robin strategy 11 years ago
Michael Peter Christen 49886fab08 enhanced debugging 11 years ago
Michael Peter Christen b893c42a0f bugfix for image search 11 years ago
Michael Peter Christen 74c249288a added a push api to make it possible to upload files directly without 11 years ago
Michael Peter Christen ba6ffddefc refactoring 11 years ago
reger 92d1604a31 Crawler hostbalancer does not delete finished queue files, 11 years ago
orbiter d7d38f9135 made number of open files in crawler configurable and increased default 11 years ago
reger ca5437dd50 fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149 11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared 11 years ago
reger 1600414450 fix NPE on continuing crawls after YaCy restart 11 years ago
Michael Peter Christen c1c1be8f02 fix for slow crawling and better logging in balancer 11 years ago
Michael Peter Christen 3acf416335 npe fix 11 years ago
orbiter 2f63bd0261 enhanced Host Balancer strategy: fair round robin 11 years ago
Michael Peter Christen 8b32dd5f9e special strategy for balancer: do not remove targets with zero wait time 11 years ago
Michael Peter Christen 9c6228d948 fix for deadlocks in crawler 11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents 11 years ago