Commit Graph

186 Commits (fcd4b03892c99b24052425b17d67b5e22bdef09e)

Author SHA1 Message Date
orbiter 8e10b82280 small fix for solr export
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter 528da7c9ea removed unused class and added license header for new class
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
sixcooler 4eb9c1e7c3 not setting userAgent from Constructor as default for following calls
14 years ago
orbiter d8e934c085 better abstraction of http client identification
14 years ago
sixcooler a3e707283d not using HTTPConnector anymore
14 years ago
orbiter 9f1f47ec67 added some comments to explain the isLocal patch
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
sixcooler 8d63f3b70f just cosmetics - keeping my baby clean :-)
14 years ago
orbiter e402622584 removed httpclient-3.1 (this was added with last commit which was a mistake)
14 years ago
orbiter 19fd13d3bc Added federated index storage to solr.
14 years ago
orbiter c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
14 years ago
orbiter b788182954 some enhancements to scoring speed
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
cominch 9ac02caf00 different initialization of empty variables in alternative constructor. This leads to wrong interpretation of user credentials, resulting in unnecessary "@" in front of host, and different urlhash values.
14 years ago
orbiter 57ce1fb491 reverted synchronization from SVN 7641
14 years ago
orbiter 7c8e764201 removed synchronization again...
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter cb6f709a16 - enhancements in surrogate reading
14 years ago
orbiter 564184909a enhanced the surrogate parser: better reading of UTF-8 characters
14 years ago
orbiter 156cf02703 - added an index constraint 'has location' to the condenser
14 years ago
orbiter 41b8d7f655 fix for url normalization (no backpath resolving in post parameters)
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
14 years ago
orbiter 2b5f8585bf performance hack for Balancer and ip address parsing
14 years ago
orbiter a6935e7dc8 fix for active dns resolving: do not resolve in case that the dns server is not available (offline mode)
14 years ago
orbiter 61acf55da4 avoided using a synchronized(this) for the hash computation to prevent that the lock on the object is (accidently) stolen by another thread and replaced this synchronization using the protocol object. Made also the protocol object final.
14 years ago
orbiter 1989ebc24b removed more warnings
14 years ago
orbiter b62b79675b removed type cast warnings
14 years ago
orbiter 8f11d3a5bb redesigned the ScoreMap classes:
14 years ago
orbiter a564230c48 more enhancements against blocked threads occurred in seed age evaluation (blocks httpd in some cases)
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 7138f4036b less synchronization, better thread dump tool
14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management
14 years ago
sixcooler 65bcc60808 stupid me: revert placement of closing connection which caused unclosed connections
14 years ago
sixcooler e3d75d6cd5 Not storing external header in an Header-Array and reduce a loop for its conversion.
14 years ago
orbiter 42d90664f3 - fixed a memory leak in the httpc.post method (no finish)
14 years ago
orbiter 38dce547c0 better concurrency (less locking on date formatting) more logging and minor bug fixes
14 years ago
orbiter 89d337841c more logging for OOMs
14 years ago
orbiter 5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
14 years ago
orbiter dec24244cf added convenience class to generate UTF StringBody objects with a default UTF8 charset.
14 years ago