Commit Graph

366 Commits (56264dcc17cb7a5cc0e2a3ca3f6a9c49bc81365f)

Author SHA1 Message Date
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter fffb91447a fixed crawl queue delete function
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
f1ori 741a87a3e9 * make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers)
14 years ago
f1ori dca9e16f51 * don't index pages, which redirect, twice
14 years ago
orbiter 09badc697b - low-memory patch for crawler
14 years ago
orbiter 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
14 years ago
orbiter 4c72885cba added a sitemap entry parser and loader for sitemaps
14 years ago
f1ori def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter e3e3b49d52 - enhanced main release recognition
14 years ago
orbiter ca738ac924 - added a tag cloud to search results (using the topics)
14 years ago
orbiter e4d561971e added more score cluster options and made score cluster usage more transparent
14 years ago
orbiter e8f90201a5 fix for scheduling of rss feeds
14 years ago
orbiter 6a166c2040 patches for bad proxy behaviour
14 years ago
orbiter 45b1ab3d07 custom + generic skins:
14 years ago
orbiter 0d363a94d7 more performance hacks
14 years ago
orbiter 091dd3f6ec - enhanced intranet search speed
14 years ago
orbiter aacf572a26 - enhancements for search speed
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
14 years ago
orbiter 48c0d508ac fixes for crawling of smb links (file length not always available)
14 years ago
orbiter 461a2a6ec7 enhanced remote crawling:
14 years ago
orbiter 5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
14 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
14 years ago
orbiter 348dece62f redesign of the SortStack and SortStore classes:
14 years ago
orbiter 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
14 years ago
orbiter 5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
14 years ago
orbiter 22047ffad5 enhanced computation speed of many replaceAll string operations
14 years ago
orbiter 9d080f387e change in handling of the all-visible home path for storage in YaCy:
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 104318d58a - added nice colors to feed indexing state messages
14 years ago
orbiter 0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
14 years ago
orbiter c60d0282fd more abstraction for tables stored in heaps:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 844f158686 - removed dependencies in header framework:
14 years ago
orbiter 5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
14 years ago
orbiter caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
14 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
14 years ago
sixcooler 661867923a ... migrating to HttpComponents-Client-4.x ...
14 years ago
orbiter 7aa860c505 - more logging
14 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
14 years ago
orbiter 7fdb17bb96 redirect uncaught exceptions to logging + small other changes
14 years ago
orbiter 87b1684211 additional double-check in balancer
14 years ago
orbiter 0d81731e88 fixed crawler bug caused by NPE in logging
14 years ago
orbiter a82a93f2fc - better url double check in crawler
14 years ago