Commit Graph

4175 Commits (ee52634daad75af907cb75cb1db0817faf3f8f12)

Author SHA1 Message Date
orbiter c3a4aee255 some redesign with a possible fix for the ReferenceContainerCache.
15 years ago
orbiter aca8a78eb8 fix for shutdown of DocumentIndex objects
15 years ago
orbiter 23ab6fbca4 - navigation appear at correct position when opengeodb-results are also presented after a search
15 years ago
orbiter 4db34eea73 fix for OOM problem in kelondro Cache
15 years ago
orbiter 8ea1d7ab59 fix for wrong assert condition in search abstract generation
15 years ago
orbiter fbd77bd77c git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6328 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter 54c7cbf1d9 - fast result for local search in case that less than 10 hits exists
15 years ago
orbiter 28d4b921b6 different approach for file search
15 years ago
orbiter f99f86c5c5 added concurrency to file indexing class
15 years ago
orbiter 902d16cf6c fixes to parser
15 years ago
orbiter 4a1c852435 fix in usage of RAM copy for Table objects and some cosmetics in asserts.
15 years ago
lotus dce450e2e0 possible fix for "hung" doc-documents
15 years ago
orbiter e627f75415 one more fix to badwords and stopwords
15 years ago
orbiter 721b88efbd - fixed a problem loading blacklists with new yacycore.jar
15 years ago
orbiter 80d5005044 fixed seed upload methods - replaced reflection with direct instantiation
15 years ago
orbiter 68465c37af added a convenience class to add files into a YaCy index
15 years ago
orbiter 2e41e10ffd - updates to yacyVersion parser (remove old targets)
15 years ago
orbiter 27d00285aa - added a new file reader cache that may serve as full-file-copy of blob database files. This is not yet used
15 years ago
orbiter fd6b9cb7dc refactoring of IO access classes
15 years ago
orbiter d64569aa39 reuturn only recommendations of words that have a greater count than the original word
15 years ago
orbiter 604c37927f used comparator for did-you-mean that uses index sizes for comparisment, but:
15 years ago
orbiter a58d9cae7d - show location name in geolocalization search result
15 years ago
orbiter 573d03c7d7 added configuration to enable ram table copy
15 years ago
orbiter 3be54e1891 fix to rule when to use a ram table copy
15 years ago
orbiter 700218846c disabled or removed sleep calls
15 years ago
orbiter 342c5d0fd4 fixed city name detection: finds now also substrings of city names
15 years ago
orbiter 18aa0609ca fix for caching of word hash computation
15 years ago
orbiter a10a6cce45 patch for http://forum.yacy-websuche.de/viewtopic.php?p=17289#p17289
15 years ago
low012 53bbdfd19a *) setting SVN keywords
15 years ago
low012 25f6145934 *) preventing null pointer exception in case empty search word or only one character is enterd or all search words are removed by filters
15 years ago
low012 248f3fd9b5 *) cleaned up code for better readability
15 years ago
orbiter eaddf2d464 - corrected layout of map preview
15 years ago
hermens 4b83875abd Small fixes for the heapCacheIterator in ReferenceContainerCache:
15 years ago
orbiter fd668f531b fixed map layout
15 years ago
orbiter 2740d9dd79 added integration of osm maps for search
15 years ago
orbiter af3a696fc4 added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast.
15 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
15 years ago
orbiter 44579fa06d - fixed a problem loading images through yacy's document loader,
15 years ago
orbiter 67eddaec4b changed way to integrate dictionary files:
15 years ago
orbiter d656a94f55 fix for bad paths in dictionary processing
15 years ago
orbiter 3b9aaf9e9f - inserted new library tests inside DidYouMean
15 years ago
orbiter 8c35ffe34c fixes to the dymlib
15 years ago
orbiter bfa273bcc1 added a library provider which holds libraries in static objects,
15 years ago
orbiter 1762a7bcd6 - moved DidYouMean to the data package
15 years ago
orbiter bf8ed00e9e removed debugging code
15 years ago
orbiter ead48c4b25 fix for preparation of search result pages with offset > 10:
15 years ago
orbiter 39a311d608 better care to do not loose the merge/dump thread
15 years ago
orbiter 10d3e856b5 better concurrency, less blocking & performance hacks
15 years ago
orbiter 1a9cfd8718 some performance hacks (CPU only, not IO)
15 years ago
orbiter 92407009b2 cleanup
15 years ago
orbiter 0ba1beaf56 separated rwi constraint evaluation from rwi ranking and added concurrency
15 years ago
orbiter ce7924d712 better concurrency for rwi entry parsing during search processing
15 years ago
orbiter b0637600d5 enhanced url constraint computation: better position of constraint check during retrieval process
15 years ago
orbiter 61748285c3 more refactoring of search
15 years ago
orbiter 323a8e733d removed unused classes
15 years ago
orbiter 72e5407115 refactoring of snippet cache
15 years ago
orbiter 0e471ba33b - fixed a bug in fast digest computation
15 years ago
low012 93b2622503 *) repaired and added IM online status indicators
15 years ago
orbiter e7736d9c8d more refactoring: made all variables in SearchEvent private
15 years ago
orbiter 4b92d0b9b7 patch for possible problems with normalization of '/' in urls. This applies in rare cases when '/' appear in post-properties
15 years ago
orbiter d8ca6e6bf1 more refactoring for search
15 years ago
orbiter fe4a4e3f6b added missing class
15 years ago
orbiter 72ac5bd80f refactoring of search process.
15 years ago
hermens c4d0e22a77 Further speed upof concurrent DHT-receive
15 years ago
hermens 2fbc0696bf Fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2334
15 years ago
f1ori d515bc11e2 added ooxmlparser
15 years ago
orbiter d9744b1b5d replaced old caching strategy control class with lightweight simplearc
15 years ago
orbiter 8e56c2ace6 fix for fixes from this afternoon
15 years ago
orbiter cf739edc2e fix for possible deadlock, see
15 years ago
orbiter 6354b5e447 removed possible deadlock, see
15 years ago
orbiter 5cc17ccf8a a better caching with less overhead and more appropriate
15 years ago
orbiter 92edd24e70 fixed problem with switching of networks
16 years ago
orbiter 0575f12838 fix for deadlock
16 years ago
orbiter fbfdaf063d - patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys
16 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 634a01a9a4 replaced wget-requests with caching requests
16 years ago
orbiter c6c97f23ad - added cache usage properties to crawl start
16 years ago
orbiter c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
f1ori ba2e6de538 fix empty version string again
16 years ago
orbiter 51534df0cb fix for possible synchronization problem
16 years ago
orbiter 4da9042e8a code simplification
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
f1ori 67da20647f * add new odf parser based on sax-xml-parser
16 years ago
f1ori 6d0e6d591b * ops, fix compiler error :(
16 years ago
f1ori 3e5beb1654 * fix for empty version in seedlist
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 597393db3b changed default visibility of classes/objects in upnp lib
16 years ago
orbiter eea4c17ef2 removed rpm parser
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 8103ccec4c removed compiler warnings in imported classes
16 years ago
lotus 52e371b8f7 suppress warnings for upnplib code
16 years ago
lotus 477807e0e6 * updated jxpath to latest v1.3
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
orbiter a564df3984 update to mime types in parsers and httpd.mime
16 years ago
orbiter 43c8defd79 enhanced parser with more extension + mime attributes
16 years ago
orbiter aee35bff6f replaced StringBuffer with StringBuilder in tar lib
16 years ago
orbiter 49bbb9bd45 replaced tar library with integrated apache ant tar lib
16 years ago
orbiter f987fc6b4a added tar classes from apache ant tools
16 years ago
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
orbiter 50cf80056f removed jmimemagic library
16 years ago
orbiter 3f113f38a8 removed unused imports
16 years ago
lotus 9f083bb6b2 check filetype before loading (no more mp4 loading)
16 years ago
f1ori 076ae02c44 * added pl and py to extensions excepted by htmlParser
16 years ago
f1ori d5e51cfd09 * workaround for non-working build property replacements
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
f1ori 8931c8d6b4 improvments to debianpackage:
16 years ago
low012 fc1dc38b55 *) added spaces to make sure that no words are concatinated by accident
16 years ago
low012 f242e7d7bc *) using Apache POI library to parse Word documents now
16 years ago
orbiter caedd72400 - enhanced logging and exception details for parsers
16 years ago
orbiter 4b74ad0a46 fixed setting of parser configuration servlets
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
lotus e15d27bc63 avoiding double/wrong parser errors
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter 8ca1f5d400 - some work to integrate the html parser the same way as the other parsers are integrated (not finished)
16 years ago
low012 1ee109761f *) added changes which were lost
16 years ago
orbiter 499723891d removed all non-http daemons; they had not been used and may be a potential security risk.
16 years ago
orbiter 0e8647d62f refactoring of search classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
low012 8041e91f56 *) Ooops!
16 years ago
low012 69551ff3d9 *) added several MIME types (derived from http://filext.com/), some of them might be rather uncommon
16 years ago
low012 11dfb2d54f minor changes:
16 years ago
orbiter 77d2a3782c removed strange debugging strings
16 years ago
lotus 4320f69574 universal handling for crashed parsers
16 years ago
orbiter 024744245c small refactoring to prepare for new queues
16 years ago
orbiter 16efcd0366 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2252&hilit=&p=16389#p16389
16 years ago
f1ori 0f3246e90a * fix debian package
16 years ago
f1ori 8544cfd5a6 * remove seperate build-files for parsers
16 years ago
orbiter 24cb6d68bc - renamed Stack to RecordStack to avoid name confusion with new classes
16 years ago
orbiter 995da28c73 all stack/heap files that had been stored in DATA/PLASMA are now stored in the network-specific QUEUES path
16 years ago
orbiter aac89bf8ca trying to avoid "exceeding limit" message of server
16 years ago
f1ori 48d78166ed * fix double copy of libraries
16 years ago
lotus 7f868ca3c2 resource observer: support for yacyroot\DATA on an NTFS hardlink (Windows)
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 160031758d fix for problem with initializer
16 years ago
orbiter 302a02cec8 moved all libraries from libx to lib
16 years ago
orbiter 1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 93dfb51fd4 problems with code style
16 years ago
orbiter adf01c676e reduce lookup time when merging a large number of BLOBs
16 years ago
orbiter 9a674d8047 - After the removal of the Tree class some code simplifications are possible. This affects mostly the Records class, which can be refactored and the result of the refactoring results in a reduced number of classes.
16 years ago
orbiter c5122d6836 completed migration of BLOBTree to BLOBHeaps:
16 years ago
orbiter d1083a6913 maybe we have less problems with open connections to the server if we don't do BF forced sleeps (just a test)
16 years ago
low012 ebe6c823ac *) changed svn properties agains (hopefully doing it right this time)
16 years ago
low012 a80ac3a415 *) fixed wrong parser descriptions
16 years ago
low012 457b6c0d6d *) updated Apache POI library to be able to parse Visio files
16 years ago
apfelmaennchen a10c8022d1 DidYouMean:
16 years ago
f1ori 7eb3bff5b3 * workaround for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2220&hilit=#p16128
16 years ago
orbiter 99fa265e1d fix for search bug caused by tenant patch
16 years ago
orbiter 79875782af be a bit more lazy when removing domain navigation entries
16 years ago
orbiter 57af311627 fix for wrong urls in navigator when a tenant is used
16 years ago
lotus 76b96337e2 just some chatty code
16 years ago
low012 91785d895c *) minor changes in comments
16 years ago
orbiter bdda140c02 fix for json output (no doubleqotes any more, doublequote quoting did not work)
16 years ago
orbiter 2f84736120 ignore signature files that cannot be downloaded because of failed encoding
16 years ago
orbiter 041d9c253e some refactoring and more error-awareness in LogalizeHandler
16 years ago
orbiter 6b307d6d59 more tolerance for corrupted index entries in exported row sets
16 years ago
orbiter 33aafa9b4b better logging when writing merged dumps
16 years ago
lotus db70badcf0 possibility to set remote host on upnp device
16 years ago
orbiter 4d29e90708 uaeh
16 years ago
orbiter 3c3e6499ae added more logging for merge operation
16 years ago
orbiter 15180fc95e - patch for future computation in SplitTable
16 years ago
orbiter 9a5ec20b3c avoid merge during startup
16 years ago
lotus bf6b92343c try to avoid stuck pdf parser
16 years ago
lotus c695c7f512 try to remove hung swf parser from queue
16 years ago
orbiter fc69a76197 update to web structure picture:
16 years ago
orbiter ae015e8e98 refactoring of blob package classes
16 years ago
orbiter 8b8877c233 moved image collector
16 years ago
orbiter be1c7ddc64 refactoring of search classes -- moved Ranking Profile to search package
16 years ago
orbiter fd31a3616a - more logging in server process
16 years ago
orbiter 5a7fd6b4c8 just some comment lines
16 years ago
orbiter 31f60a3b3e when doing searches, also apply a online caution to DHT transmission and stop transmissions while heavy load caused by searching. This omits the many requests to the URL database that are needed for DHT transfer and it avoids collisions with URL retrieval needed for search results.
16 years ago
orbiter 17dc6d4be5 small fix for new Logger
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
lotus aec3e7995a autoconfig.pac can be used to browse .yacy-domains only
16 years ago
orbiter bc6dd8194b refactoring: moved search query class to new search package
16 years ago
orbiter a4805defdd added stub for new search process
16 years ago
orbiter b8e738a7be a collection of
16 years ago
apfelmaennchen 39779e4796 DidYouMean: as I moved to only 8 consumer and 4 producer threads, I removed poison pills as it does not make sense anymore - threads are interrupted directly. Having a consumer thread per test case just didn't make sense either (see svn 6070) due to the massive overhead.
16 years ago
apfelmaennchen c3c4dd0933 DidYouMean - changed to much simpler LinkedBlockingQueue
16 years ago
apfelmaennchen 01ac1b5d7e - blocking queue implementation of DidYouMean
16 years ago
orbiter b8bb1bb364 join with a timeout does not cause that the corresponding thread is stopped after the time-out. It does only cause that the waiting is stopped. Here we need additionally a signal to the thread to stop after we finished waiting.
16 years ago
orbiter b69f22e9ca mistake in last commit: computation of loops in ReversingTwoConsecutiveLetters
16 years ago
orbiter 3130334932 - start first with threads that run more loops
16 years ago
apfelmaennchen 6cde7ebf16 DidYouMean
16 years ago
orbiter f348190566 tried to insert a database dump import method to the phpBB3 import function. Reason: imports or large database dumps are cannot be handled with phpMyAdmin and this should be an easy way to the database dumps into a mySQL database where it can be exported again with the phpBB3 content integration adapter. Completion or removal of this function stub will follow before next main release.
16 years ago
orbiter 945777aa80 replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
16 years ago
orbiter 7c4d1d471c hand-over of more specific object
16 years ago
apfelmaennchen 09acfa66d1 - improved "did you mean"
16 years ago
apfelmaennchen da6ce37f7b - fixed encoding problem
16 years ago
apfelmaennchen 54a48b4184 - added "did you mean" to search page
16 years ago
orbiter 550312ac85 added new command script to do a auto-Update from command line. this will make it easy to do mass-auto-updates in private yacy clusters
16 years ago
orbiter 0fc1168554 - reduced time-out for socket-connection communication from 20 seconds to 5 seconds. This is a test to find out if the time-out was a cause for problems in metager environments
16 years ago
orbiter 28b86385cd patch for bad behaving swf parser
16 years ago
orbiter d58b395993 fix for http://forum.yacy-websuche.de/viewtopic.php?p=15693#p15693
16 years ago
orbiter 733385cdd7 enahnced database access times by removal of unnecessary synchronization.
16 years ago
orbiter 398e210fef removed synchronization in logging that causes deadlocks in high-performance environments
16 years ago
orbiter db3a06dd81 removed cookie handling in httpc:
16 years ago
orbiter 1c54ae4a63 some small changes in HandleMap Testing
16 years ago
orbiter 2c5554c912 small enhancements in search result computation speed
16 years ago
orbiter e0b3984805 added navigation keys for site and author facets to remote search interface
16 years ago
orbiter 27fa6a66ad - completed the author navigation
16 years ago
orbiter a9a8b8d161 - added display of author navigation (usage of that navigator not yet implemented
16 years ago
orbiter c879783008 added steering of navigator computation:
16 years ago
orbiter c079b18ee7 - refactoring of IntegerHandleIndex and LongHandleIndex: both classes had been merged into the new HandleMap class, which handles (key<byte[]>,n-byte-long) pairs with arbitraty key and value length. This will be useful to get a memory-enhanced/minimized database table indexing.
16 years ago
orbiter bead0006da replaced tmp file extensions by prt
16 years ago
orbiter 3189f9cd39 fixed problem with DCEntry initialization
16 years ago
orbiter a704d82280 patch for problem with digest
16 years ago
orbiter 3029ef6eb3 fixed a bug that was recently inserted which caused that no idx and gap files were written.
16 years ago
orbiter b6e274f211 omit most of forced crawl delays by using a separat delay table which flushes delayed URLs at the correct time
16 years ago
orbiter d50be59088 - added a automatic re-construction of the domain stack after 10 minutes. this includes then urls to the domain stack that were left over in case of stack size limitations when the domain stack was created the last time
16 years ago
orbiter 5fdba0fa51 - fixed a not working selection rule in balancer
16 years ago
orbiter f5602404d5 another speed boost for the balancer
16 years ago
orbiter 95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
16 years ago
orbiter c062385552 fix for http://forum.yacy-websuche.de/viewtopic.php?p=15555#p15555
16 years ago
orbiter 42ae40b9f6 some bugfixes to database close() methods
16 years ago
orbiter a0c53abbe1 - wait until local results are computed during search, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2167&hilit=&p=15521#p15521
16 years ago
orbiter 9bfd22f65d fix for http://forum.yacy-websuche.de/viewtopic.php?p=15523#p15523
16 years ago
orbiter 1c77db670f re-designed response format for navigation:
16 years ago
orbiter 15fad767c0 some refactoring of topic generation
16 years ago
orbiter cc49aedf12 - fixed problem with remote search NPE
16 years ago
f1ori 9e18abc2ac * fix charset detection, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2137
16 years ago
orbiter c38c852090 modified access method to get index entries out of a array of BLOBs:
16 years ago
orbiter ab06a6edd2 renamed topwords to topics and enhanced computation methods of topics
16 years ago
orbiter a5d481eab1 enhanced navigation
16 years ago
orbiter 7639ec2f38 - fixed letter case bug for dc record creation
16 years ago
orbiter 4522c13ee7 added option for a table prefix when importing phpbb3
16 years ago
orbiter 1c69d9b8b6 more refactoring of the index classes
16 years ago
orbiter 3d5f2ff544 - added new servlets to support search portal administrators for the integration of yacy search fields in their web pages
16 years ago
orbiter 4d4315687f fix for problem with concurrency in host navigator, bug reported by wsb
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
lotus d813fd26ed reset sent/received counters on index delete
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 876746602d catch problems of file hash computation, see also:
16 years ago
orbiter fec6f9054f some refactoring of search methods
16 years ago
orbiter 3d4b826ca5 migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
16 years ago
orbiter 4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
16 years ago
orbiter d8284046b0 enhanced speed of site navigation computation
16 years ago
orbiter c72a5cf326 added stub for PHPBB3 extraction code using direct access to mySQL
16 years ago
orbiter e735d3a69f fix for http://forum.yacy-websuche.de/viewtopic.php?p=15175#p15175
16 years ago
orbiter 63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
16 years ago
orbiter f246928c20 first attempt to add 'real' Navigation to yacy search results: host navigation
16 years ago
orbiter 54b9e99c01 - more information about peer tags
16 years ago
orbiter 26a46b5521 increased default maximum file size for database files to 2GB
16 years ago
orbiter addecdb18c simplified code, removed one unused method in all implementing classes
16 years ago
borg-0300 47fce9020c small change (Orbiter's wish)
16 years ago