Commit Graph

3687 Commits (fd31a3616a094d8a4356da183f724f662230ecbc)

Author SHA1 Message Date
orbiter 138422990a - removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
16 years ago
orbiter 1b9e532c87 some concurrency for wikipedia dump reader
16 years ago
lotus 25d2160288 small fix
16 years ago
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
16 years ago
orbiter 0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
16 years ago
orbiter 5195c94838 two patches for performance enhancements of the index handover process from documents to the index cache:
16 years ago
orbiter 9416f5c26f more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes:
16 years ago
orbiter b53790abb1 more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
16 years ago
orbiter 8ffb9889e1 some fixes and performance hacks
16 years ago
orbiter dfb96ecb72 more fixes
16 years ago
orbiter 1b8d346b4c fixes in connection with transiton to byte[] hashes
16 years ago
f1ori 0b0a46d35a * fix transferRWI as suggested by celle (thanks!)
16 years ago
orbiter 996572de95 quickfix
16 years ago
orbiter 380ed2dac0 performance and debugging additions
16 years ago
lotus 635b0a9da7 code-split
16 years ago
orbiter fa3adbbfc6 added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
16 years ago
f1ori 76af84d732 * add custom comparator to ScoreCluster for byte[]
16 years ago
lotus ab0030d7a7 allow dht-out for remote-crawl processing peers on default settings
16 years ago
low012 d1116c049f *) added new method "contains()" to Blacklist interface
16 years ago
f1ori 08445e42f0 * don't throw exception, in case of bad charset in http-header
16 years ago
f1ori 2f860a2564 * convert byte[] hashes to string for log output
16 years ago
f1ori d93a2a6552 * ignore whitespaces so you can copy&paste signatures better
16 years ago
orbiter fbcbcc5bdb export of yacy document objects as dublin core record in xml
16 years ago
orbiter d7cbf4cdd4 more performance hacks: less overhead in word hash computation
16 years ago
orbiter 29e96c1a60 bugfixes and performance hacks
16 years ago
orbiter 4e97a31009 corrections in dublin core syntax
16 years ago
f1ori 44daec7936 * introduce signatures to autoupdate
16 years ago
orbiter 538e375901 replaced old caching method for computed word hashes with a better method. The word hash computation is a new performance bottleneck (after the IO bottleneck was removed with the IndexCell data structure) and a better caching for word hashes was necessary.
16 years ago
orbiter 9e853e1977 partly reverting SVN 5818: identical comparator required for join operator
16 years ago
orbiter e16c25ddf7 (peak-) performance hacks
16 years ago
orbiter 63cd152969 fixes
16 years ago
orbiter 7dfe7e7cc6 fixed some problems with surrogate reader. This is now ready for testing.
16 years ago
orbiter 3a1364ed5c removed example lines from SurrogateReader sources; added additional example file
16 years ago
orbiter 9050a3c4c5 alpha version of surrogate reading and indexing.
16 years ago
orbiter b15b059c0d fix for latest commit
16 years ago
orbiter c8624903c6 full redesign of index access data model:
16 years ago
f1ori dd6b5005ff * fix missing charset handling in getpageinfo_p
16 years ago
orbiter bd5f4c78d8 - added default profile for surrogate indexing
16 years ago
orbiter ad78e3a59f - less lines in rssTerminal
16 years ago
orbiter bc80dc913a added new surrogate reader (surrogates are parsed documents on batches)
16 years ago
orbiter 12d81e98eb - fixed bad search results when searching for empty string
16 years ago
orbiter 8a24350036 - fix for join method with new generalized RWI data structure (caused by latest commit)
16 years ago
orbiter e58320a507 added more info in log fore debugging
16 years ago
orbiter 89ec3acb3e - full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
16 years ago
borg-0300 7a48090fcf - fix for "uk" language
16 years ago
orbiter dc2af61bc9 allow up to 50 results from remote peers
16 years ago
orbiter c0e8ed5461 fixed problem with not http client
16 years ago
orbiter 8862a2fed0 ups
16 years ago
orbiter de68948bc5 better handling of free memory computation and emrgency cache flush for index cell
16 years ago
f1ori fcb77c3140 * added .im (Isle of Man) to TLD-list
16 years ago
orbiter b81c7467d8 protection against too many files in RICELL in case of massive emergency dumps caused by low memory
16 years ago
orbiter d4d87d90c4 - extended experimental wikipedia dump parser
16 years ago
orbiter c3aff2521e fix for NPE
16 years ago
orbiter 57c00dd8c9 fix for bad filtering of common http error
16 years ago
orbiter 14361f1ca4 added log message for index generation in HeapReader
16 years ago
orbiter c08f9b36a4 refactoring of wiki parser.
16 years ago
orbiter 44e01afa5b - refactoring
16 years ago
orbiter 82fb60a720 increased memory limit for emergency cache flush
16 years ago
low012 9180617dd9 *) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon.
16 years ago
lotus 596e6215dc fix in case of white space in path name
16 years ago
orbiter b887f4a116 keep more free mem
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter ab656687d7 more strict BLOB initialization .. may also help to save some ram
16 years ago
orbiter 5b138ada16 fixes to web structure reference collection and url construction
16 years ago
orbiter a29a11e526 added evaluation of incoming links in webstructure api
16 years ago
orbiter f6691411b5 - migration of files from SplitTable (which are used for the URL-DB) to a different file name format.
16 years ago
shostakovich 1f37cc6107 Robots.txt is now reused after one day. See forum-topic:
16 years ago
orbiter f21a8c9e9c a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second.
16 years ago
orbiter 7ba078daa1 - added fast site-operator
16 years ago
orbiter b4126432bc hardening of index dump write process
16 years ago
orbiter 9bfb2641db - removed deprecated threads
16 years ago
orbiter 293290c317 fix for bad assert in last commit
16 years ago
orbiter bd409fb7ba added web structure analysis for a special domain that can be requested from the api.
16 years ago
orbiter b6c2167143 - patch for bad web structure dumps
16 years ago
orbiter 0139988c04 - added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up.
16 years ago
orbiter 3621aa96ab - added a memory protection for the IndexCell migration
16 years ago
orbiter 568e8f1741 fix in unmountBLOB
16 years ago
orbiter 9da69d6b68 - better selection of files to be merged
16 years ago
orbiter d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards.
16 years ago
orbiter 029495e64d fixed bug introduced in SVN 5756 in EcoTable.put()
16 years ago
orbiter 587838bd09 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter d2e2420a68 - added another file selection method for index cell merge
16 years ago
orbiter 96eaecda3e - added migration class to go from index collections to the index cell data structure.
16 years ago
orbiter 0f0b4aec75 better index cell merge logic
16 years ago
orbiter 832fef670f migration of urls-files into subdirectory METADATA
16 years ago
orbiter fa07234d4e fix for clear method: now deletes files
16 years ago
lulabad df87e4dbf6 missing count of send Index and URLs
16 years ago
borg-0300 c450e3746b svn attributes added
16 years ago
orbiter 37f892b988 added new concurrent merger class for IndexCell RWI data
16 years ago
borg-0300 8c494afcfe svn attributes added
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago
orbiter 0926310461 another performance hack
16 years ago
orbiter ebe5d69d14 performance hacks
16 years ago
orbiter 61f9dbf0cc - fixed a display problem in watch crawler
16 years ago
orbiter b3f75e48fa - enhanced balancer: auto-solving of waiting-deadlocks
16 years ago
orbiter 9a90ea05e0 added a merge operation for IndexCell data structures
16 years ago
orbiter d99ff745aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378
16 years ago
orbiter 0c3ab291c4 fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354
16 years ago
orbiter a9cea419ef Integration of the new index data structure IndexCell
16 years ago
borg-0300 fd0976c0a7 refactoring
16 years ago
orbiter 83792d9233 more refactoring
16 years ago
borg-0300 ce79239322 "typo"
16 years ago
borg-0300 cdbdc731c5 small updates: unescape, isCGI
16 years ago
orbiter 474aac65af more refactoring
16 years ago
orbiter 209f25f5f5 refactoring to integrate indexCell data structures
16 years ago
borg-0300 359a238acf faster isCGI()
16 years ago
borg-0300 f75628e53b some corrections
16 years ago
orbiter b7138e5fcb even more efficient comparator calls (less System.arraycopy for primary keys)
16 years ago
orbiter 65784eb656 - more efficient comparator calls
16 years ago
orbiter 44874cb550 added a deleteOnExit for blob file deletion in case that a deletion is not successful.
16 years ago
orbiter 66f78d67e0 bad idea. Concurrency in index management will be done differently
16 years ago
orbiter 7dff1cba62 removed option to use different primary keys in kelondro tables
16 years ago
orbiter 7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter d49238a637 more performance hacks: better default values for scaling, less memory usage
16 years ago
orbiter 39644dc14e performance hacks to compare methods in database core
16 years ago
orbiter e2e7949feb replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table.
16 years ago
orbiter f6d989aa04 added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy.
16 years ago
borg-0300 0a2fabeef3 static TMPDIR
16 years ago
lotus 9f7e62e900 refactoring
16 years ago
lotus f35dc11dc4 allow crawl start from pages with script tags
16 years ago
orbiter 6958eff196 removed unnecessary exceptions, extended testing in IntegerHandleIndex
16 years ago
orbiter 13c666adef performance hack to ObjectIndex put() method:
16 years ago
orbiter 1f1be1518c added stub for another performance hack: concurrent indexes
16 years ago
orbiter 3e4c28e188 enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!)
16 years ago
orbiter 84e37387a2 fix for last commit and more testing stub
16 years ago
orbiter ca006c506d stub for performance enhancements for RowSet (no functional change yet)
16 years ago
orbiter d988204875 better shutdown of tools
16 years ago
orbiter 100247bdda added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following:
16 years ago
hermens 8c60d6d117 In DHT selection delete only those references that were actually selected
16 years ago
orbiter 60078cf322 added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS
16 years ago
orbiter b1ddc4a83f do not merge collections if ram == false
16 years ago
orbiter dbdd10da84 better logging and startup behaviour for referenceHash computation
16 years ago
orbiter d64836c34f added statistical analysis of URL reference
16 years ago
orbiter 3b28daab40 code-beautification (to be consistent with external documentation paper)
16 years ago
orbiter 485c9406e5 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249
16 years ago
orbiter 858f800a07 more logging in httpd to detect shutdown cause. See also:
16 years ago
orbiter b80db04667 - refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
16 years ago
lotus 8ee946bf1d show upnp status
16 years ago
orbiter 16f5c6a85e fixed merge method initialization in ReferenceContainer
16 years ago
orbiter d7a493b4f5 added experimental timeline api
16 years ago
orbiter efcd95dc37 simplification of (internal) query process / refactoring
16 years ago
orbiter f1b712c29a small corrections to image loading methods in result presentation
16 years ago
orbiter d4b56d5819 added more asserts to BLOBHeap.flushBuffer() to fix the problem described in
16 years ago
f1ori c545fcb9fa * add class to handle keys and signatures
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter 6ffc6e3389 more refactoring of indexer and kelondro classes;
16 years ago
orbiter 404bc21da9 simplification of (internal) query process / refactoring
16 years ago
orbiter 76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
16 years ago
orbiter 2df57b1fd1 refactoring of index collection class
16 years ago
lotus 39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
16 years ago
orbiter d1d9fbae5c enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line
16 years ago
orbiter c728879ab8 fixes to yacyURL - more exceptions in case that urls are strange
16 years ago
orbiter 7542336ae5 performance enhancement to yacyURL: omit second processing of resolveBackpath. This method is already applied during initialization of the object and was called a second time when the url was exportet.
16 years ago
orbiter 7ea53fe47b added another url list transformation option:
16 years ago
orbiter e521e81148 bugfix in yacyURL (for latest performance hack)
16 years ago
orbiter 54625360f7 performance update
16 years ago
orbiter d884c4718a added gzip support for URLAnalysis:
16 years ago
orbiter 46632f4385 performance update to yacyURL
16 years ago
orbiter cf9b74e6e3 added another method to process url lists: extract hosts only
16 years ago
orbiter 89d8e824ed memory protection for URLAnalysis
16 years ago
orbiter 0f6fa804ff performance update to URLAnalysis
16 years ago
orbiter 8444357291 added new row interator in kelondro tables files that enumerates rows
16 years ago
orbiter e8f5f2f612 added tool to analyse url strings
16 years ago
lotus 6117e083e5 option to customize tray label (tooltip) with tray.label
16 years ago
orbiter b8c3803bfc don't panic when canceling server sessions
16 years ago
orbiter de714783b1 - added host, path, filename to search result
16 years ago
lotus 9519d84372 changed "dooble" variable to "browserintegration" to be less specific
16 years ago
lotus 8429083972 adjusted tray for dooble:
16 years ago
orbiter c852d2d70e - reject too old seeds
16 years ago
orbiter aca973e2d9 catch more exceptions
16 years ago
orbiter 9559bc23fd automatic clean-up of dead connections
16 years ago
hermens 02dfd6183b Fix logging in serverCore
16 years ago
hermens d30456e2c8 Fix logging in serverCore
16 years ago
orbiter 4f9dae2571 remove reference in crawl entries
16 years ago
orbiter 1ba4301920 automated interruption of dead incoming connections, if they are there for more than one minute
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter 5d3983faae the soLinger parameter was wrong.
16 years ago
orbiter 62505bb3cb more bugfixes as recommendet by findbugs
16 years ago
orbiter 6b450d09ca some fixes recommended by findbugs
16 years ago
orbiter 4db80065ac select more
16 years ago
orbiter 94c42691d8 - reject less transmissions as transmission receiver
16 years ago
orbiter f887fc159f try to reduce the large number of unclosed incoming connections
16 years ago
orbiter e04a0e05c3 fix for last commit
16 years ago
orbiter a9ad863686 second part of 'doubles' fix - better handling of doubles in RAMIndex. More logging.
16 years ago
orbiter 59427064fb first part of 'doubles' fix (not fully ready yet)
16 years ago
orbiter 26978b2a25 - better memory protection in kelondro caches: computation of needed memory for cache grow
16 years ago
lotus e9e2fff47a better scaling on performance graph
16 years ago
lotus 4aad461100 added UPnP support
16 years ago
orbiter 99b9788e54 fix for possible 100% CPU caused by concurrent access of HashMap
16 years ago
orbiter be0c492ae5 fix for memory leak bug in new dht transmissions
16 years ago
hermens 2173865f92 Prevent race condition when switching timezones.
16 years ago
orbiter 40d9849aa4 - better control of chunk size in dht selection
16 years ago
orbiter 30a1de41b3 disabled the BufferedIOChunks, because I consider it as broken.
16 years ago
orbiter 411f2212f2 more memory leak fixing hacks
16 years ago
orbiter 985d421f91 found and fixed some memory leaks
16 years ago
orbiter 333489420b - fix for NPE when loading the cytag image
16 years ago
orbiter 6a32193916 - refactoring of cache naming in web index cache (no more dht semantics there)
16 years ago
orbiter 6c627dbdff update to the server core
16 years ago
orbiter 5393f356aa fix for termination problem
16 years ago