Commit Graph

399 Commits (c17d102bd829746091451f070e6ec4ca060eaaea)

Author SHA1 Message Date
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
orbiter 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
f1ori efcf37a953 * show info in log, if robots.txt is rejected due to wrong mime-type
14 years ago
orbiter b1a8d0c020 enhancements to web cache and less strict caching rules
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
orbiter 2b5f8585bf performance hack for Balancer and ip address parsing
14 years ago
orbiter 8f11d3a5bb redesigned the ScoreMap classes:
14 years ago
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter 1214615185 fix for 'invisible entry', see http://forum.yacy-websuche.de/viewtopic.php?p=22133#p22133
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 1110d16af9 performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
14 years ago
low012 ce012e11aa *) deleted LogStatistics since the page did not work anymore and it seemed to be obsolete, tell me if you miss it and I will add it again
14 years ago
low012 c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
lotus b1484299b2 same units for memory observer configuration (MiB)
14 years ago
orbiter 0769f4caa6 added search suggestions for interactive search: is only shown if there are no search results
14 years ago
f1ori e4aabaa1c3 * fix negative filelength for files >2G
14 years ago
f1ori ee3cef91e8 * fix filesize in ftp crawls
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter e88c428008 fix to ftp loader
14 years ago
orbiter 9b25a33fd9 - fixed numerous bugs
14 years ago
orbiter 7bdb13bf7f more fixes to smb crawling: better file names
14 years ago
orbiter 94c48500cc several fixes
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter fffb91447a fixed crawl queue delete function
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
f1ori 741a87a3e9 * make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers)
14 years ago
f1ori dca9e16f51 * don't index pages, which redirect, twice
14 years ago
orbiter 09badc697b - low-memory patch for crawler
14 years ago
orbiter 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
14 years ago
orbiter 4c72885cba added a sitemap entry parser and loader for sitemaps
14 years ago
f1ori def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter e3e3b49d52 - enhanced main release recognition
14 years ago
orbiter ca738ac924 - added a tag cloud to search results (using the topics)
14 years ago
orbiter e4d561971e added more score cluster options and made score cluster usage more transparent
14 years ago
orbiter e8f90201a5 fix for scheduling of rss feeds
14 years ago