Commit Graph

60 Commits (e02bfbde569117645c730968d6fa4e4c1ba33b41)

Author SHA1 Message Date
orbiter 1c007188ad bugfixes in html parser
14 years ago
orbiter 231074bf0a fixed a parsing bug by reverting SVN 7766
14 years ago
orbiter 5dd2efc9a2 - bugfixes in html parser
14 years ago
orbiter 51cf697acd refactoring: moved all score-related classes to new ranking package
14 years ago
orbiter 299af4943c added another memory protection hack
14 years ago
orbiter b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that
14 years ago
orbiter bda3eec0ff added parsing of canonical link element to html parser
14 years ago
orbiter 9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts)
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter e28bd0d038 fix for some possible causes of memory leaks
14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks
14 years ago
orbiter 021840e5ba removed (almost) deadlocks and unnecessary CPU load
14 years ago
orbiter 4e8fa03514 added more attributes to html evaluation
14 years ago
orbiter 528da7c9ea removed unused class and added license header for new class
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter a92d80a545 performance enhancements using an alternative to a insensitive collator (a complex string compare):
14 years ago
orbiter e717bf74ba more logging, more care about OOMs
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 88773e4daa changed the default port from 8080 to 8090
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter 9b25a33fd9 - fixed numerous bugs
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
f1ori a025b1da89 * fix bug when browsing local filesystem (e. g. repository) with yacy
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter b8aee6d402 performance hacks for better search performance
15 years ago
orbiter 24502fe3de performance hacks
15 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds!
15 years ago
orbiter 5924a0d851 - enhanced concurrency in database index access for multicore
15 years ago
orbiter 60e71876ad - more abstraction (HashMap -> Map)
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
15 years ago
orbiter 4cd5418963 removed finalize methods because of a hint in
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter e0da0a84b0 performance fix in http parser
15 years ago
orbiter 82f76e1296 removed log line
15 years ago
orbiter 0f8004f9da enhanced html parser to recognize a href tags inside header tags
15 years ago
orbiter 54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter a37878b7d5 url parser regex performance hack
15 years ago