Commit Graph

202 Commits (a65ecffef6d6a18d49723d34128572519a6dabc7)

Author SHA1 Message Date
orbiter 9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts) 14 years ago
orbiter f667b9c289 enhanced identificator: using AtomicInteger for counter 14 years ago
orbiter 115abc8917 - more attributes for search progress bar 14 years ago
orbiter 77fe69395d added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html 14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks 14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
orbiter e28bd0d038 fix for some possible causes of memory leaks 14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation 14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks 14 years ago
orbiter 205cc75157 abstraction of surrogate main element (xmlns:geo was missing for wiki extracts) 14 years ago
orbiter 021840e5ba removed (almost) deadlocks and unnecessary CPU load 14 years ago
orbiter 9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder' 14 years ago
orbiter 76f2817e00 a fix for the snippet computation and hopefully better snippets 14 years ago
orbiter deda54d684 - relaxed matching of string-search (this is now case-insensitive) 14 years ago
orbiter 15e3a57b4e removed unused functions in condenser 14 years ago
orbiter e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting 14 years ago
orbiter 4e8fa03514 added more attributes to html evaluation 14 years ago
orbiter 528da7c9ea removed unused class and added license header for new class 14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures 14 years ago
orbiter d8e934c085 better abstraction of http client identification 14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content 14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url) 14 years ago
orbiter 958ff4778e enhanced location search: 14 years ago
orbiter c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments 14 years ago
orbiter b788182954 some enhancements to scoring speed 14 years ago
orbiter 01690eab86 fix for mediawiki importer and wikicode parser 14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks 14 years ago
orbiter 564184909a enhanced the surrogate parser: better reading of UTF-8 characters 14 years ago
orbiter 156cf02703 - added an index constraint 'has location' to the condenser 14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages 14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser 14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed 14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated. 14 years ago
orbiter a50f28e6e7 - fixed missing save operation for peer name change 14 years ago
orbiter 1989ebc24b removed more warnings 14 years ago
orbiter 8f11d3a5bb redesigned the ScoreMap classes: 14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion 14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding 14 years ago
lotus cb6d307bba adding extension for parser 14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage 14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations 14 years ago
low012 3b40b98256 *) set SVN properties 14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'. 14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management 14 years ago
orbiter f8d0454c53 small bug fixes and experiments with search speed enhancement 14 years ago
orbiter 5e186e0122 continuing the fight against deadlocks during time formatting: better caching. 14 years ago
orbiter a92d80a545 performance enhancements using an alternative to a insensitive collator (a complex string compare): 14 years ago
orbiter e717bf74ba more logging, more care about OOMs 14 years ago
orbiter 5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased. 14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain 14 years ago