Commit Graph

793 Commits (42425c800362e36bb59e501adbc214dd405a8b67)

Author SHA1 Message Date
orbiter 299af4943c added another memory protection hack
14 years ago
orbiter 1f300217f8 more protection for the cleanup thread
14 years ago
orbiter d13103a0a7 changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
14 years ago
orbiter b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that
14 years ago
orbiter 6a6f27eaf3 do not sort arrays again if arrays are already sorted
14 years ago
orbiter 3d043ce9d6 - refactoring
14 years ago
orbiter 48b78e9ff4 disabling concurrency in new sort since that is not working yet correctly
14 years ago
orbiter 62ac73a108 fixed bugs and deadlocks in core database indexing structures:
14 years ago
orbiter 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
14 years ago
orbiter bb8e3f8523 code cleanup
14 years ago
orbiter 11dc653de3 added a visualization of peer pings to the performance graphic
14 years ago
orbiter 3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
14 years ago
orbiter 52d799e7c8 fix for solr auth
14 years ago
orbiter 9eb8e9acd9 no error message about missing browser in headless environments
14 years ago
orbiter d3c89b90ce temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
14 years ago
orbiter bd99969758 fixed bad query
14 years ago
orbiter 768c59740c - replaced solrj 3.1 with solrj 3.3
14 years ago
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
orbiter 6d2e252bcf fix for:
14 years ago
orbiter 2d4bb139d3 - added counting of links with noindex tag for solr index
14 years ago
orbiter 892caccdca added default configuration in ConfigurationSet in case of new values
14 years ago
orbiter bda3eec0ff added parsing of canonical link element to html parser
14 years ago
orbiter b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
14 years ago
orbiter b666a929e7 fixed Semaphore handling in case of interruptions
14 years ago
orbiter de7a054d77 added parser for such files like the new solr.key.list
14 years ago
orbiter 267290a821 removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
14 years ago
orbiter d8072d1866 added more info to DNS cache in /PerformanceMemory_p.html
14 years ago
orbiter f803da8aae code cleanup
14 years ago
orbiter 84c9658644 added a file type navigator
14 years ago
orbiter 31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
14 years ago
orbiter 7db208c992 performance hacks: more pre-allocated StringBuilder
14 years ago
orbiter 07e89a7ae5 added @Deprecated
14 years ago
orbiter 9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts)
14 years ago
orbiter 996f0a8764 disabled assert in Base64Order which eats away too much performance during testing with -l
14 years ago
orbiter f667b9c289 enhanced identificator: using AtomicInteger for counter
14 years ago
orbiter 16327d1cbe unwrapping of call depth (one call less for UTF8.String)
14 years ago
orbiter f30d36b101 enhanced template engine
14 years ago
orbiter aa6c32d753 enhanced UTCDiffString
14 years ago
f1ori f87865a50b always shutdown log, fixes zombie processes in init stop script
14 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
orbiter 77fe69395d added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
14 years ago
sixcooler df1725ef43 re-enable POST over proxy, which didn't work since update to httpcore-4.1.1
14 years ago
orbiter 2683162ec5 - added more options to access grid picture, web structure picture and network graphics
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter fe0c08455b more concurrency (enhancement) hacks
14 years ago
orbiter 87082f407e less String object creation during search
14 years ago
orbiter 3c2b994bd6 write access/load time to solr index
14 years ago
orbiter a36fda991e hack to increase speed of url hash computation
14 years ago
orbiter dbea40d536 - changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 746e3c3b06 Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
14 years ago
orbiter e28bd0d038 fix for some possible causes of memory leaks
14 years ago
orbiter 09ba6814c0 - non-blocking word hash computation with dynamic digest object generation (this was important!)
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter bd55dcee50 - commented out experimental distributed ranking loading
14 years ago
orbiter 98c4d25185 fix for endless loop in FTP crawling, see http://bugs.yacy.net/view.php?id=32
14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks
14 years ago
orbiter b45701d20f this is a re-implementation of the YaCy Block Rank feature
14 years ago
orbiter d27a0a67ff fix in log initialization according to hint from Dominic
14 years ago
orbiter 205cc75157 abstraction of surrogate main element (xmlns:geo was missing for wiki extracts)
14 years ago
orbiter 021840e5ba removed (almost) deadlocks and unnecessary CPU load
14 years ago
orbiter 123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
14 years ago
orbiter 5c981762c6 added bigrange option for network scan
14 years ago
orbiter bade61696f speed-up of network port scanner
14 years ago
orbiter 1d8b0f74f4 one more fix for SVN 7713
14 years ago
orbiter 0960261769 fix for svn 7713
14 years ago
orbiter 5b579e21a3 code cleanup
14 years ago
orbiter 039126cfaf better handling of on/off switched solr indexing
14 years ago
orbiter dc54915df4 fix for very bad compare
14 years ago
orbiter 9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder'
14 years ago
orbiter 76f2817e00 a fix for the snippet computation and hopefully better snippets
14 years ago
orbiter deda54d684 - relaxed matching of string-search (this is now case-insensitive)
14 years ago
orbiter 15e3a57b4e removed unused functions in condenser
14 years ago
orbiter 6e42d4de88 - added full-String search function: find things that match exactly what is quoted in the query
14 years ago
orbiter 8e10b82280 small fix for solr export
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting
14 years ago
orbiter 4e8fa03514 added more attributes to html evaluation
14 years ago
orbiter 528da7c9ea removed unused class and added license header for new class
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
sixcooler 4eb9c1e7c3 not setting userAgent from Constructor as default for following calls
14 years ago
orbiter d8e934c085 better abstraction of http client identification
14 years ago
sixcooler a3e707283d not using HTTPConnector anymore
14 years ago
orbiter 9f1f47ec67 added some comments to explain the isLocal patch
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
sixcooler 8d63f3b70f just cosmetics - keeping my baby clean :-)
14 years ago
orbiter e402622584 removed httpclient-3.1 (this was added with last commit which was a mistake)
14 years ago
orbiter 19fd13d3bc Added federated index storage to solr.
14 years ago
orbiter c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
14 years ago
orbiter b788182954 some enhancements to scoring speed
14 years ago
orbiter 01690eab86 fix for mediawiki importer and wikicode parser
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
cominch 9ac02caf00 different initialization of empty variables in alternative constructor. This leads to wrong interpretation of user credentials, resulting in unnecessary "@" in front of host, and different urlhash values.
14 years ago
orbiter 57ce1fb491 reverted synchronization from SVN 7641
14 years ago
orbiter 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
14 years ago
orbiter 7c8e764201 removed synchronization again...
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter cb6f709a16 - enhancements in surrogate reading
14 years ago
low012 1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
14 years ago
orbiter 564184909a enhanced the surrogate parser: better reading of UTF-8 characters
14 years ago
orbiter 156cf02703 - added an index constraint 'has location' to the condenser
14 years ago
orbiter 41b8d7f655 fix for url normalization (no backpath resolving in post parameters)
14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
14 years ago
orbiter 8412f8787d fix for http://bugs.yacy.net/view.php?id=8
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
lotus cbf87fe72f write PID to yacy.running
14 years ago
orbiter b1a8d0c020 enhancements to web cache and less strict caching rules
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
f1ori df71776929 * fix bug #7
14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
14 years ago
orbiter a50f28e6e7 - fixed missing save operation for peer name change
14 years ago
orbiter 2b5f8585bf performance hack for Balancer and ip address parsing
14 years ago
orbiter b1d133b69f another anhancement to the ThreadDump function: better multiple dumps and filtering out of not interesting dump parts
14 years ago
orbiter a35d513bd8 fix for not-deleted .gap and .idx files
14 years ago
orbiter a6935e7dc8 fix for active dns resolving: do not resolve in case that the dns server is not available (offline mode)
14 years ago
orbiter 859c99886c fix for multiple thread dump
14 years ago
orbiter 61acf55da4 avoided using a synchronized(this) for the hash computation to prevent that the lock on the object is (accidently) stolen by another thread and replaced this synchronization using the protocol object. Made also the protocol object final.
14 years ago
orbiter c2a968c23f fix for bug in formatting in ThreadDump
14 years ago
orbiter 078ecacf61 avoid synchronization in DigestURI hash requests
14 years ago
orbiter 1989ebc24b removed more warnings
14 years ago
orbiter 0324de1467 removed debug line
14 years ago
orbiter 1aba7869bf patch for Windows: do not use the thread lock feature from previous commit if used on Windows
14 years ago
orbiter 0a11727374 added new feature for Thread dump:
14 years ago
orbiter b62b79675b removed type cast warnings
14 years ago
orbiter a07a1a8b1e removed type cast warnings
14 years ago
orbiter 8edaccfedf removed unused variables
14 years ago
orbiter e6c3507b17 disabled some of the previous changes (did not work in openjdk)
14 years ago
orbiter f9e5c21083 update to thread dump logs
14 years ago
orbiter 8f11d3a5bb redesigned the ScoreMap classes:
14 years ago
orbiter a564230c48 more enhancements against blocked threads occurred in seed age evaluation (blocks httpd in some cases)
14 years ago
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
lotus cb6d307bba adding extension for parser
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
orbiter 96bb33ed9b added default size to StringBuffer in logger (and it is not possible to replace the StringBuffer with a StringBuilder...)
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter 619b561a4a enhanced secondary search: index abstracts decompression is now much faster and does not cause strong CPU load after several searches with more than one word
14 years ago
low012 bf27a72d53 *) set SVN properties
14 years ago
low012 b649ce2dd7 *) minor changes
14 years ago
orbiter 70a996a06c reverted SVN 7557 because these classes are called using reflection. The class declaration is in the log configuration. Without these classes you get errors during runtime and a non-formatted log output, i.e.:
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
low012 9d366ee9d7 *) removed unused code (I assume that most of the code was really dead, but if you need any of the classes, tell me and I will put it back in.)
14 years ago
orbiter 7138f4036b less synchronization, better thread dump tool
14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management
14 years ago
orbiter c2c5b12882 - even less memory for circle tool
14 years ago