Commit Graph

1012 Commits (49ffedfd8b80ae1ac8d720cfb280bf8ecaa19c66)

Author SHA1 Message Date
orbiter 50f2578c55 - some bugfixing and code cleanup
18 years ago
orbiter bdf4c7c51e added missing files for last commit
18 years ago
orbiter a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
18 years ago
octoate 1c4076da8a First version of the MS Powerpoint parser based on Apache POI
18 years ago
theli 5b75d64d7d *) bugfix for last commit
18 years ago
theli 71ed104bc7 *) adding additional rpm mimetype (used by packman)
18 years ago
orbiter 6396f5971e bugfixes and migration attempt toward new kelondroFlex db
18 years ago
hermens 48f81acc0e reverse SVN 2744, it is not needed
18 years ago
hermens 1da9aece12 Repair DNS prefetch during cacheScan
18 years ago
theli 22649408ad *) Better errorhandling for charset encoding problem during content parsing
18 years ago
theli a9c7e3f061 *) Bugfix for NoSuchElementException
18 years ago
orbiter c8f3a7d363 added snippet-url re-indexing
18 years ago
low012 2cfd4633ac *) even better handling of searchwords in snippets, words can consist of letters and numbers now
18 years ago
orbiter e17fea7015 files in htcache are now stored in different hash/tree subdirectories
18 years ago
low012 2d3b7251a4 *) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
18 years ago
orbiter 25ae3d3161 generalized definition of hexhash
18 years ago
orbiter f0d747c723 removed deprecated method
18 years ago
orbiter 5ff77612ac bugfix for old WORDS storage method
18 years ago
orbiter 0f10bdde22 more generic cache methods
18 years ago
hermens 6557112d8f small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
18 years ago
hermens 440c6ee657 Implement alternative htcache layout
18 years ago
orbiter fd61209797 lines inside tags without punctuation are extended by a single dot.
18 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
18 years ago
orbiter 43614f1b36 bugfix in collection index. the index for collections was not created correctly
18 years ago
orbiter db294687ea enhanced logging
18 years ago
theli a9a0f51303 *) suppressing InterruptedException errormessage
18 years ago
theli 1d4fb680ce *) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
18 years ago
theli 1586d57187 *) odtParser: better handling of large files
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
orbiter 630a955674 read snippets from cache in case they are not provided in RAM
18 years ago
orbiter dbc2e039bb added time-out option parameter to call hierarchy
18 years ago
orbiter 00746ca232 identified and fixed search performance problem caused by
18 years ago
orbiter 310f1c41cd added option to see ranking scores in surftipps
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
low012 f8ac694e51 *) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998)
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli b73efd5565 *) missing changes needed because of last commit
18 years ago
orbiter 2463e5624a 'quick' release 0.47
18 years ago
theli 625c2ce6b1 *) bugfix for snippet fetching problem if content but not http header is available in cache
18 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
18 years ago
hermens 3f5a4153a0 Make Peers more receptible to transferred indexes
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 1dc12d6659 *) Bugfix for shutdown problem caused by cacheScan thread
18 years ago
borg-0300 42173462f5 rename cutUrlText to shortenURLString;
18 years ago
theli 26dfbb7499 *) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
18 years ago
theli cf6acff2c2 *) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
18 years ago
theli 5c6251bced *) some improvements for extended html document charset support
18 years ago
orbiter f453c14b5d removed unreacheable catch blocks and unused imports
18 years ago
theli ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
orbiter f644a1c3a7 better evaluation of index abstracts
18 years ago
allo 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611
18 years ago
theli 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli c5d3020941 *) better errorhandling for last commit
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
orbiter 26ab1fa885 fixed null pointer exception
18 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem
18 years ago
orbiter 41e27b85b7 fix for crawler condition
18 years ago
theli 9ecf7f0da2 *) some TODO makers for UTF-8 problem
18 years ago
orbiter c89d8142bb replaced old 'kCache' by a full-controlled cache
18 years ago
orbiter 6e2907135a bugfixes for remote search server part
18 years ago
orbiter cf9884e22b first attempt to implement a secondary search
18 years ago
orbiter b251076e64 avoid ConcurrentModificationException
18 years ago
orbiter 75b198bc02 - updated references to indexContainer
18 years ago
orbiter b7e7808ea6 wordmigration now works also for new index database
18 years ago
theli a0ddf2ec11 *) AbstractCrawlWorker.java: delete already downloaded data on crawling error
18 years ago
orbiter 4f9e42d5ed more changes towards better join-search
18 years ago
orbiter a7281a9b4d fix for last commit
18 years ago
orbiter 82a6054275 - fixed bug with new indexAbstract generation
18 years ago
theli fded1f4a5d *) better handling of maximum file size limit in crawler
18 years ago
orbiter 74d1dea30b changes towards better join-search
18 years ago
orbiter ae4e8ce03e - cut for 'probably last html-interface version': version number update
18 years ago
orbiter 64bed59ee8 enhancements to ranking
18 years ago
theli 63893003be *) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
18 years ago
orbiter 94d7ced900 fix for last ranking commit
18 years ago
orbiter 03835c2ee8 enhanced search result computation
18 years ago
orbiter ac3419b65f better debugging for indexOutOfBoundException bug
18 years ago
orbiter a8bc768206 enhancements to ranking evaluation
18 years ago
theli 33898ae7e9 *) ResourceInfoFactory.java: Bugfix for classNotFoundException
18 years ago
theli 406e170e25 *) more verbose error message
18 years ago
theli b298474e22 *) Bugfix needed because of changed plasmaCrawlLURL.load behavior
18 years ago
orbiter 96c6e4e322 - enhancements to detailed search page
18 years ago
orbiter 9340dbb501 fixed all possible problems with nullpointer exception for LURLs
18 years ago
theli a5ed86105b *) bugfix for handling of ResourceInfo object in proxy
18 years ago
hermens ff4362b02d some more fixes for new plasmaCrawlLURL.load behavior
18 years ago
hermens 7aeadbe7cc another NullPointerException in http.ResourceInfo
18 years ago
orbiter 141f9e5bb4 fix for new plasmaCrawlLURL.load behavior
18 years ago
hermens 087f7511f8 prevent NullPointerException in http.ResourceInfo
18 years ago
orbiter a2525072f2 bugfix for kelondroRow - property generation
18 years ago
theli b44514242a *) crawler/ftp/CrawlWorker.java: better errorhandling
18 years ago
theli 7d7f30139c *) crawler/ftp/CrawlWorker.java: delete old cache file
18 years ago
theli 4ae0f122f8 *) ResourceInfo.java: License header added
18 years ago
theli 043edfa4d8 *) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
18 years ago
orbiter 4866868c0e added write cache for LURLs
18 years ago
orbiter 8a0e35618b enhancements to search result preparation
18 years ago
theli 5c1bb53d2a Missing description for last commit
18 years ago
theli dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli 4825bfaaf3 *) Bugfix for PrintWriter Problem
18 years ago
theli 7930839594 *) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
18 years ago
theli 7a35b8e237 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
18 years ago
theli ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
18 years ago
theli 3870d615e3 *) setting htCache.Entry fields to private
18 years ago
theli 393a7d10be *) setting htCache.Entry fields to private
18 years ago
theli ab5a9bee66 *) adding some copyright headers
18 years ago
theli 5847492537 *) next step of restructuring for new crawlers
18 years ago
theli fce9e7741b *) next step of restructuring for new crawlers
18 years ago
theli e3f0136606 *) next step of restructuring for new crawlers
18 years ago
theli 9ded4e8d5a *) Bugfix for name resolution in proxy mode
18 years ago
theli 1c8300fcec *) Bugfix for name resolution in proxy mode
18 years ago
theli 4e2a950ac9 *) next step of restructuring for new crawlers
18 years ago
theli 09b106eb04 *) next step of restructuring for new crawlers
18 years ago
theli eb9b138986 *) next step of restructuring for new crawlers
18 years ago
theli 1395aae742 *) starting restructuring which is needed to add crawlers for additional protocols
18 years ago
theli b4acbdaa97 *) better handling of server shutdown
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
theli 959b779aba *) avoid performance loss if log level is greater than 'fine'
18 years ago
orbiter 18b6876860 new cache flush configuration settings
18 years ago
hermens f0278b4092 Bugfix for / by zero when the AssortmentCluster is empty
18 years ago
orbiter 14e0bb0dcf allow more references per word for new db
18 years ago
orbiter 985dcbde7f changed some parameters that may cause better memory usage and more indexing speed
18 years ago
orbiter b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
18 years ago
orbiter c26da4893b turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
18 years ago
orbiter db1eae0227 * simplified initialization of database objects
18 years ago
hermens 0b73f2b132 Repair DNS prefetch during cacheScan
18 years ago
orbiter 27a159b401 * documentation update
18 years ago
theli f80f776b89 *) Trying to solve NullpointerException problem in function addURLtoErrorDB
18 years ago
hydrox 1c99b5a484 *)fixed logging for urldbcleanup
19 years ago
orbiter 8f3f4ab0eb enhanced synchronisation in plasmaWordIndex
19 years ago
orbiter 23dd972608 fixed memory calculation in performanceMemory web page
19 years ago
orbiter 1ce3c22761 better memory control:
19 years ago
orbiter 39b4c26bdc more memory control:
19 years ago
orbiter 3e9d509c39 some small fixes
19 years ago
orbiter eb633c0a4f server threads must now supply a method that can be called in case
19 years ago
orbiter f5720cb2fa removed most synchronization in wordIndex (for testing)
19 years ago
orbiter 0187c60010 because of a bug in the JRE 1.4.2 there was no memory protection
19 years ago
orbiter cfb51fdef1 less synchronization in plasmaWordIndex
19 years ago
orbiter d6a928c2da quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705
19 years ago
orbiter 6ad471ef96 * applied many compiler warning recommendations
19 years ago
hydrox 9da3aa74d3 silly me, fix for the fix as advised by theli
19 years ago
hydrox bb3d9a5582 *) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage.
19 years ago
hydrox 7a54010a9c *) Iterators can't be casted to IndexContainer
19 years ago
orbiter cd5f7e137c fixed problem with NURL-generation upon first startup
19 years ago
orbiter 8418af141a added several consistency checks and small changes
19 years ago
theli 9d13aeca13 *) removing class. does not work so far
19 years ago
theli 95a84ae469 *) adding missing classes
19 years ago
theli eee44be602 *) adding an interface for customized blacklist classes
19 years ago
orbiter 6d2f15971a there is a very strange error that causes that the kelondroRecords structure
19 years ago