Commit Graph

1931 Commits (29a1f132ece98c12b203bee626b19bef093cf077)

Author SHA1 Message Date
orbiter bdf4c7c51e added missing files for last commit
18 years ago
orbiter a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
18 years ago
orbiter 130cc76927 loop detection and termination in deletedHandles method
18 years ago
octoate 1c4076da8a First version of the MS Powerpoint parser based on Apache POI
18 years ago
theli 5b75d64d7d *) bugfix for last commit
18 years ago
theli 71ed104bc7 *) adding additional rpm mimetype (used by packman)
18 years ago
borg-0300 76d959122b new constants, finals, Stringbuffer, cleanup
18 years ago
rramthun 581dd2ec72 *)Proper arrow-function on Network.html, but ordering is still broken. Perhaps someone could fix that?
18 years ago
orbiter 6396f5971e bugfixes and migration attempt toward new kelondroFlex db
18 years ago
hermens 48f81acc0e reverse SVN 2744, it is not needed
18 years ago
hermens 1da9aece12 Repair DNS prefetch during cacheScan
18 years ago
orbiter 918b59dc5e - bugfix for snippet profile (no delete button)
18 years ago
orbiter 2bb529cedb added peer tags for peers in robinson mode
18 years ago
orbiter afbb547f3d extended options for abstracts generation in remote search interface
18 years ago
theli 22649408ad *) Better errorhandling for charset encoding problem during content parsing
18 years ago
theli a9c7e3f061 *) Bugfix for NoSuchElementException
18 years ago
orbiter f25f61d9d3 documentation of compile problem. See
18 years ago
orbiter c8f3a7d363 added snippet-url re-indexing
18 years ago
low012 2cfd4633ac *) even better handling of searchwords in snippets, words can consist of letters and numbers now
18 years ago
orbiter b062847797 fix for
18 years ago
orbiter e17fea7015 files in htcache are now stored in different hash/tree subdirectories
18 years ago
orbiter 661f005214 fix for seed upload build script
18 years ago
low012 2d3b7251a4 *) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
18 years ago
orbiter ddf8f220f6 fix for build fail
18 years ago
orbiter 25ae3d3161 generalized definition of hexhash
18 years ago
orbiter 86047f439d removed very bad bug that prevented production of any remote search result
18 years ago
orbiter f0d747c723 removed deprecated method
18 years ago
orbiter 5ff77612ac bugfix for old WORDS storage method
18 years ago
orbiter 0f10bdde22 more generic cache methods
18 years ago
orbiter 72482b1426 fixed scraper
18 years ago
hermens 6557112d8f small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
18 years ago
hermens 440c6ee657 Implement alternative htcache layout
18 years ago
allo 226f2c5b2c first version, of the Serverlet Debugger
18 years ago
orbiter adf1f74ab2 bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
18 years ago
orbiter fd61209797 lines inside tags without punctuation are extended by a single dot.
18 years ago
allo 1d0c0edda3 first version of posts/get from the del.icio.us api
18 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
18 years ago
orbiter 43614f1b36 bugfix in collection index. the index for collections was not created correctly
18 years ago
orbiter 1dfab1abe3 more control for seed receive
18 years ago
theli 1c0e65f55f *) Bugfix for problems with charset detection
18 years ago
orbiter db294687ea enhanced logging
18 years ago
theli a9a0f51303 *) suppressing InterruptedException errormessage
18 years ago
theli ce7ee74316 *) better errorhandling in filehandler (try catch block now starts before argument parsing)
18 years ago
theli 1d4fb680ce *) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
18 years ago
theli 1586d57187 *) odtParser: better handling of large files
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
orbiter 630a955674 read snippets from cache in case they are not provided in RAM
18 years ago
orbiter bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication
18 years ago
orbiter c40fca08a2 fixed bad handling of string separation
18 years ago
orbiter 5a40ea7866 refactoring of wget string list generation
18 years ago
orbiter dbc2e039bb added time-out option parameter to call hierarchy
18 years ago
orbiter d4c239e4be - fixed problem in collection index with deletion of single url references
18 years ago
orbiter 00746ca232 identified and fixed search performance problem caused by
18 years ago
orbiter b033a80750 better control of failure in node seek of kelondroTree
18 years ago
orbiter 310f1c41cd added option to see ranking scores in surftipps
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
theli 8b2ceddb91 *) Displaying servere and warning logging messages in different colors on ViewLog_p.html
18 years ago
low012 f8ac694e51 *) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998)
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli c665f6cddb *) handling of quotes in charset string
18 years ago
theli b73efd5565 *) missing changes needed because of last commit
18 years ago
theli 140ddba93f *) adding soap functions to pause and resume the crawler
18 years ago
orbiter 2463e5624a 'quick' release 0.47
18 years ago
theli 49fbb688df *) SOAP: old urlInfo renamed to urlInfoByHash, new urlInfo Function added.
18 years ago
theli 8f143d516b *) make snippet fetcher accessible via soap api
18 years ago
theli 97615af406 *) Restructuring of YaCy SOAP services
18 years ago
theli 241b881560 *) Redesign of YaCy SOAP handler
18 years ago
theli 009a33170b *) Content-Location header added
18 years ago
theli 1aa07a52cd *) Bugfix for UnsupportedEncodingException if the media type contains multiple parameters
18 years ago
theli 625c2ce6b1 *) bugfix for snippet fetching problem if content but not http header is available in cache
18 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
18 years ago
hermens 3f5a4153a0 Make Peers more receptible to transferred indexes
18 years ago
theli 57415b6889 *) Bugfix for surftipps UTF-8 problem
18 years ago
allo b0a4fcce8c fix from theli
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 64b2ef5aae *) Trying to bugfix shutdown problem
18 years ago
orbiter e03427871e enhanced surftipps:
18 years ago
theli 1dc12d6659 *) Bugfix for shutdown problem caused by cacheScan thread
18 years ago
borg-0300 42173462f5 rename cutUrlText to shortenURLString;
18 years ago
borg-0300 af1d89e381 check url == null added;
18 years ago
theli cc667b0aa5 *) htmlFilterContentScraper.java: adding support for link tag
18 years ago
theli 26dfbb7499 *) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
18 years ago
theli cf6acff2c2 *) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
18 years ago
borg-0300 f18304ddd3 unused/not needed imports removes;
18 years ago
orbiter ec031eb993 first version of surftipps
18 years ago
borg-0300 b174fbd0ca "import ...*" removed;
18 years ago
orbiter 807756150e patch for strange bug reported by email
18 years ago
theli 5c6251bced *) some improvements for extended html document charset support
18 years ago
theli 33f0f703c0 *) reinserting type cast again
18 years ago
orbiter 8c11a543dc fixed line ending coding
18 years ago
theli b690597275 *) adding casts to avoid compatibility problems between java 1.4 and java 1.5 writer class usage
18 years ago
theli 5afb0cbce8 *) setting default charset (for unkown documents) to iso-8859-1
18 years ago
orbiter f453c14b5d removed unreacheable catch blocks and unused imports
18 years ago
theli ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
theli fc594e8eda *) adding httpContentLengthInputStream.java class to allow reading of http response bodies
18 years ago
low012 cd636eb00e *) Fix for the fix...
18 years ago
low012 f9a5b55a9e *) Fixed bug described in http://www.yacy-forum.de/viewtopic.php?p=25448#25448
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
low012 8a30c5343d *) Fixed bug where exclamation marks could get lost between [=...=] and <pre>...</pre>
18 years ago
low012 d8f4b17e31 *) Hopefully fixed bug described in http://www.yacy-forum.de/viewtopic.php?t=2825.
18 years ago
theli 0e84a969d6 *) Bugfix for serverCharBuffer read from file operation
18 years ago
theli 90ef19d778 *) first version of a serverCharBuffer
18 years ago
orbiter d374ef2bbe bugfix for tryRemoveURLs
18 years ago
orbiter f644a1c3a7 better evaluation of index abstracts
18 years ago
orbiter 1b48473bc5 bugfix to utf8 recognition
18 years ago
orbiter 90f7241b59 serverByteBuffer.trim() can now recognize utf-8 characters
18 years ago
allo 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611
18 years ago
theli e34d9b3fec *) charset aware headlines (after the serverByteBuffer.trim problem is solved)
18 years ago
theli 8115ac47b5 *) charset aware metadata parsing
18 years ago
theli 3ac30bdf22 *) some todo markers added for additional charset support
18 years ago
theli 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli c5d3020941 *) better errorhandling for last commit
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
orbiter d82875c72b removed removal of 'funny symbols' that may have caused utf-8 problems
18 years ago
orbiter 26ab1fa885 fixed null pointer exception
18 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem
18 years ago
orbiter 41e27b85b7 fix for crawler condition
18 years ago
orbiter 0ee7e45413 bugfix for merge method (caused by bad refactoring)
18 years ago
orbiter 40965e183e bugfix for minimizeurldb and urldbcleanup
18 years ago
orbiter 5c2f30eaca adjustments to dhtInCache write
18 years ago
theli 9ecf7f0da2 *) some TODO makers for UTF-8 problem
18 years ago
theli e2f8339827 *) some bugfixes for UTF-8 related problems
18 years ago
orbiter c89d8142bb replaced old 'kCache' by a full-controlled cache
18 years ago
orbiter 6e2907135a bugfixes for remote search server part
18 years ago
orbiter cf9884e22b first attempt to implement a secondary search
18 years ago
theli 2a06ce5538 *) next bugfix for UTF-8
18 years ago
theli bdc51591ae *) UTF-8 Bug solved (hopefully)
18 years ago
theli ef751b9d33 *) removing all string operations from the template engine
18 years ago
orbiter 7ef80c1026 more debugging
18 years ago
orbiter b251076e64 avoid ConcurrentModificationException
18 years ago
orbiter 75b198bc02 - updated references to indexContainer
18 years ago
orbiter 0bed3b9ac3 removed superfluous interface
18 years ago
orbiter b7e7808ea6 wordmigration now works also for new index database
18 years ago
theli a0ddf2ec11 *) AbstractCrawlWorker.java: delete already downloaded data on crawling error
18 years ago
orbiter 4f9e42d5ed more changes towards better join-search
18 years ago
auron_x 005400a137 *) reverted last commit
18 years ago
orbiter a7281a9b4d fix for last commit
18 years ago
orbiter 82a6054275 - fixed bug with new indexAbstract generation
18 years ago
theli fded1f4a5d *) better handling of maximum file size limit in crawler
18 years ago
orbiter 416b4e5c6b ups
18 years ago
orbiter 309accb983 memory control for ymage generation:
18 years ago
orbiter 74d1dea30b changes towards better join-search
18 years ago
auron_x 045ffebbd8 *) added debugline to versionstring-processing to find a possible bug in versiongeneration
18 years ago
orbiter ae4e8ce03e - cut for 'probably last html-interface version': version number update
18 years ago
orbiter 64bed59ee8 enhancements to ranking
18 years ago
theli 63893003be *) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
18 years ago
auron_x 06b1365066 *) fixed existing protection against divbyzero and removed the new one
18 years ago