Commit Graph

70 Commits (4540174fe04f45fcd0dc4fd7e1ee43b3ce94ae86)

Author SHA1 Message Date
Michael Peter Christen 4540174fe0 memory hacks
13 years ago
Michael Peter Christen b7bb84c0bb set a limit to CharBuffer object size to fight against bad/too large
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
Michael Christen 1f4afb4dc0 performance hacks
13 years ago
Michael Christen 9cd469e6d6 added pull request from als plus an NPE fix
13 years ago
Al Sutton 3f9b9f953f Added close() to ensure buffer close actions are invoked
13 years ago
Al Sutton d73c84f9a0 Allow initial buffer size definition in TransformWriter, and use available() method to set it in htmlParser. In this situation a ByteArrayInputStream is used so the available() method gives a good size estimation and avoid the buffer needing to be continually grown
13 years ago
Al Sutton 8993cac4d8 Initial performance improvements
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
low012 277b454a62 *) added comments
13 years ago
orbiter 1c007188ad bugfixes in html parser
13 years ago
orbiter 231074bf0a fixed a parsing bug by reverting SVN 7766
13 years ago
orbiter 5dd2efc9a2 - bugfixes in html parser
13 years ago
orbiter 51cf697acd refactoring: moved all score-related classes to new ranking package
13 years ago
orbiter 299af4943c added another memory protection hack
14 years ago
orbiter b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that
14 years ago
orbiter bda3eec0ff added parsing of canonical link element to html parser
14 years ago
orbiter 9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts)
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter e28bd0d038 fix for some possible causes of memory leaks
14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks
14 years ago
orbiter 021840e5ba removed (almost) deadlocks and unnecessary CPU load
14 years ago
orbiter 4e8fa03514 added more attributes to html evaluation
14 years ago
orbiter 528da7c9ea removed unused class and added license header for new class
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter a92d80a545 performance enhancements using an alternative to a insensitive collator (a complex string compare):
14 years ago
orbiter e717bf74ba more logging, more care about OOMs
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 88773e4daa changed the default port from 8080 to 8090
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter 9b25a33fd9 - fixed numerous bugs
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
f1ori a025b1da89 * fix bug when browsing local filesystem (e. g. repository) with yacy
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter b8aee6d402 performance hacks for better search performance
14 years ago
orbiter 24502fe3de performance hacks
14 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds!
14 years ago
orbiter 5924a0d851 - enhanced concurrency in database index access for multicore
14 years ago
orbiter 60e71876ad - more abstraction (HashMap -> Map)
15 years ago