Commit Graph

5141 Commits (4d7ae76017d6faf8f2f467bd5b2658b4786c5055)

Author SHA1 Message Date
orbiter 2cba860693 - fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
13 years ago
orbiter 2842ce30d6 added synchronization in ReferenceContainer and logging for shrinking
13 years ago
orbiter cec3836e73 added reference limitation to IndexControlRWIs_p.html servlet
13 years ago
sixcooler ecb4986b38 refactored stuff from last commit to ReferenceContainer
13 years ago
sixcooler f7c4abfdd7 limit references per blob & term to the 100.000 youngest
13 years ago
orbiter 28f5b79deb added a fast mass-deletion method
13 years ago
orbiter a70dbce41c added another file tool class to yacy-cora
13 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
13 years ago
orbiter e02bfbde56 fix for solr url
13 years ago
f1ori 41e146116a fixes size of document in case the server doesn't give the size in the header
13 years ago
orbiter 580beb12a5 reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
13 years ago
orbiter 44d6416e2d ensure termination of shrink()
13 years ago
orbiter 52230a6864 replaced catching of Exception with Throwable, which catches also Errors
13 years ago
orbiter 877eaf6bcb switched off logging of org.apache.http which was suddenly switched on by default (??)
13 years ago
orbiter e1a3d609aa moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
13 years ago
sixcooler 2cf61a40ce fixed a bug from 7856, where Snippet returned an error by mistake when Metadata was found
13 years ago
orbiter 610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
13 years ago
orbiter 3da21c4266 protection against starting of a (second) yacy peer while another one is already running on the same port
13 years ago
orbiter b5252ef91f added new word recommendation library in DictionaryLoader_p.html
13 years ago
orbiter 1c007188ad bugfixes in html parser
13 years ago
orbiter 231074bf0a fixed a parsing bug by reverting SVN 7766
13 years ago
low012 30a8a2f76b *) replacing one ugly hack with an extended ugly hack ;-)
13 years ago
low012 95379ce0b1 *) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253)
13 years ago
low012 24e76a7b69 *) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
13 years ago
sixcooler d40a177c05 Generation Memory Strategy fine tuning
13 years ago
sixcooler 839f407fe4 Generation Memory Strategy fine tuning:
13 years ago
orbiter 3e6767d66c limitation of reference evaluation (protection against crawler pits)
13 years ago
orbiter a5541751a8 - added memory computation to termlist_p.xml
13 years ago
orbiter 45e497a9bd fix for term iteration
13 years ago
orbiter 5dd2efc9a2 - bugfixes in html parser
13 years ago
orbiter 2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
13 years ago
orbiter 75df87832c refactoring/better naming of methods and classes
13 years ago
orbiter 9f9f634de2 fix in search
13 years ago
sixcooler 5f8a5ca32d - not doing merge-jobs while short on Memory
13 years ago
orbiter 965fabfb87 enhanced sorting speed (affects all DB operations)
13 years ago
orbiter 41a8ee4569 added iterable implementation in KeyList
13 years ago
orbiter 22d69a6368 refactoring in cora: added sorting package
13 years ago
orbiter 51cf697acd refactoring: moved all score-related classes to new ranking package
13 years ago
orbiter a0d5e7b6e6 added new score comparator
13 years ago
sixcooler 169236c6d9 almost revert changes in this class of 7880 and 7882
13 years ago
sixcooler 4fec99115b Implementation of strategies for controlling memory resources.
13 years ago
sixcooler 63a375b801 do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
13 years ago
orbiter 2c58af6874 - added a short memory status simulation mode
13 years ago
orbiter c64faf41e2 addon to svn 7880
13 years ago
sixcooler 7b7a196243 ignore cookies in httpclient per default
13 years ago
sixcooler 06408a9428 since many POST-requests come as gzip they report a contentlength of -1
13 years ago
sixcooler 411ed159f8 do some extra sleep while running low on memory
13 years ago
sixcooler 9ab0ba41e2 using GzipDecompressingEntity from httpclient instead of our own
13 years ago
sixcooler 07f5954570 try better handling of corrupt blobs
13 years ago
orbiter f970670a7c - bugfix in ServerScannerList
13 years ago
orbiter 8e03b8ee8b better integration of server list in interactive search
13 years ago
orbiter 0a3ab7da1b do not sort concrrently the same array
13 years ago
orbiter 594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
13 years ago
sixcooler eb14111200 encapsulate potential expensive objects in TextSnippet to allow GC them asap
13 years ago
orbiter 0d33cf352b removed synchronization in DNS resolve (solves a problem when loading snippets but in the past concurrent dns requests also caused deadlocks. but this is many years ago and we will give it another try)
13 years ago
orbiter e3fc1efbef performance hack and ensuring termination in serverAccessTracker. cause:
13 years ago
orbiter 44d74f8f89 performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
13 years ago
sixcooler 5cd07d7f84 early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
13 years ago
sixcooler a311596881 finishing up my commits (7855-7858) which could be helpful for
13 years ago
sixcooler 9170a434ed throwing an exception again in FileUtils.copy(reader, writer)
13 years ago
sixcooler c0caca57e3 stoping thread for fetching searchresults if running short on memory
13 years ago
sixcooler ce248cc8dd less byte-arrays of response-content, less byte-array <-> stream conversation
13 years ago
sixcooler 59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
13 years ago
sixcooler 916d79111e Runtime.maxMemory() DOES change @ runtime:
14 years ago
f1ori 3a5fa73008 * revert parts of previous commit, because it breaks the trickle-feature
14 years ago
f1ori 6e79675ff3 * use gzip-encoding in more cases
14 years ago
orbiter 299af4943c added another memory protection hack
14 years ago
orbiter 1f300217f8 more protection for the cleanup thread
14 years ago
orbiter d13103a0a7 changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
14 years ago
orbiter b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that
14 years ago
orbiter 6a6f27eaf3 do not sort arrays again if arrays are already sorted
14 years ago
orbiter 3d043ce9d6 - refactoring
14 years ago
orbiter 48b78e9ff4 disabling concurrency in new sort since that is not working yet correctly
14 years ago
orbiter 62ac73a108 fixed bugs and deadlocks in core database indexing structures:
14 years ago
sixcooler aff875baef smaler ping-entry @ ProfilingGraph
14 years ago
orbiter 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
14 years ago
orbiter bb8e3f8523 code cleanup
14 years ago
orbiter be15874be1 added request line in http which can support better debugging
14 years ago
orbiter 11dc653de3 added a visualization of peer pings to the performance graphic
14 years ago
orbiter 3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
14 years ago
cominch 09bb7a390c do not replace malformed or invalid URLs in urlproxy
14 years ago
orbiter 52d799e7c8 fix for solr auth
14 years ago
orbiter 9eb8e9acd9 no error message about missing browser in headless environments
14 years ago
orbiter d3c89b90ce temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
14 years ago
orbiter bd99969758 fixed bad query
14 years ago
orbiter 768c59740c - replaced solrj 3.1 with solrj 3.3
14 years ago
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
orbiter 6d2e252bcf fix for:
14 years ago
orbiter 719777b2a7 replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
14 years ago
orbiter 2d4bb139d3 - added counting of links with noindex tag for solr index
14 years ago
orbiter 892caccdca added default configuration in ConfigurationSet in case of new values
14 years ago
orbiter bda3eec0ff added parsing of canonical link element to html parser
14 years ago
orbiter b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
14 years ago
orbiter b666a929e7 fixed Semaphore handling in case of interruptions
14 years ago
orbiter de7a054d77 added parser for such files like the new solr.key.list
14 years ago
f1ori a17351dcfe * navigation bar for filetype constraints
14 years ago
f1ori 96957375cc * fix url proxy for relative links and chromium
14 years ago
orbiter 9ebc75db4b fix for channel authorization
14 years ago
orbiter 267290a821 removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
14 years ago
orbiter 6d9e5865ee faster appearance of search result page (but complete search time is the same)
14 years ago