Commit Graph

563 Commits (474e29ce4a831550d06a4fd3d6354fd41e32f121)

Author SHA1 Message Date
orbiter 0a050bc043 enhanced ranking
18 years ago
orbiter 61798f0ae6 added option to distinguish between text crawl and media crawl
18 years ago
orbiter febe6b114a design update of crawler monitor
18 years ago
orbiter 7ff86d6ba6 - image search now shows thumbnails (in bad order, but it works)
18 years ago
orbiter ee3d91cb6b print-out of links that result from contraint-filtering
18 years ago
orbiter 1377c53aa3 extraction of media links from search results
18 years ago
orbiter bf0d820659 - added correct flagging of word properties
18 years ago
orbiter a603c4d5e8 more code simplifications
18 years ago
orbiter 9a85f5abc3 cleanup
18 years ago
orbiter 109ed0a0bb - cleaned up code; removed methods to write the old data structures
18 years ago
orbiter 052f28312a removed assortments from indexing data structures
18 years ago
orbiter 2372b4fe0c release 0.49
18 years ago
orbiter ad1e4aa88e added selection of audio, video, image and application resources
18 years ago
orbiter 7cc4cec9c9 bugfix for assertion bugs documented in
18 years ago
orbiter ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
18 years ago
orbiter 30888e7a2f implementation of search constraints
18 years ago
orbiter f4b547dc13 limited index transfer to peer with version 0.486
18 years ago
orbiter e3d75f42bd final version of collection entry type definition
18 years ago
orbiter c9364246cc introduced new RWI-Object.
18 years ago
orbiter 497428c8ec refactoring
18 years ago
orbiter 76fceb9997 refactoring
18 years ago
orbiter bb7d4b5d5e refactoring to prepare new RWI entry object
18 years ago
orbiter ba967c4875 - bugfixes and debug code
19 years ago
orbiter ee4715a21c - more asserts
19 years ago
orbiter 114a76a86e - added flag to urlhash that shows that domain is a local domain
19 years ago
orbiter 8fdefd5c68 generalization of payload definition of index storage
19 years ago
hydrox 7e8669b15c *) added possibility to "recycle" a DHTChunk that failed to transfer.
19 years ago
auron_x 194d42b6a7 *) changed PPM-calculation to be more accurate
19 years ago
orbiter 2a9d868f6d - removed object cache from kelondroTree
19 years ago
orbiter 06854988da - full integration of new LURL database in INDEX
19 years ago
orbiter b79e06615d - added new LURL.Entry class for next database migration
19 years ago
theli 3d152bfe43 *) Logging message added
19 years ago
orbiter 77a59a115d refactoring of indexing methods
19 years ago
orbiter a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
19 years ago
orbiter 6396f5971e bugfixes and migration attempt toward new kelondroFlex db
19 years ago
orbiter c8f3a7d363 added snippet-url re-indexing
19 years ago
orbiter 0f10bdde22 more generic cache methods
19 years ago
hermens 440c6ee657 Implement alternative htcache layout
19 years ago
orbiter 43614f1b36 bugfix in collection index. the index for collections was not created correctly
19 years ago
theli a9a0f51303 *) suppressing InterruptedException errormessage
19 years ago
theli f17ce28b6d *) plasmaHTCache:
19 years ago
orbiter dbc2e039bb added time-out option parameter to call hierarchy
19 years ago
orbiter 00746ca232 identified and fixed search performance problem caused by
19 years ago
orbiter 310f1c41cd added option to see ranking scores in surftipps
19 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
19 years ago
theli cd5f349666 *) Better handling of large files during parsing
19 years ago
orbiter df1629b05a - code cleanup
19 years ago
hermens 3f5a4153a0 Make Peers more receptible to transferred indexes
19 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
19 years ago
borg-0300 42173462f5 rename cutUrlText to shortenURLString;
19 years ago
theli cf6acff2c2 *) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
19 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
19 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
19 years ago
theli d0a5a53789 *) changes needed for multi-language support
19 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem
19 years ago
orbiter c89d8142bb replaced old 'kCache' by a full-controlled cache
19 years ago
orbiter 75b198bc02 - updated references to indexContainer
19 years ago
theli a0ddf2ec11 *) AbstractCrawlWorker.java: delete already downloaded data on crawling error
19 years ago
orbiter 64bed59ee8 enhancements to ranking
19 years ago
orbiter a8bc768206 enhancements to ranking evaluation
19 years ago
orbiter 96c6e4e322 - enhancements to detailed search page
19 years ago
orbiter 9340dbb501 fixed all possible problems with nullpointer exception for LURLs
19 years ago
hermens ff4362b02d some more fixes for new plasmaCrawlLURL.load behavior
19 years ago
orbiter 4866868c0e added write cache for LURLs
19 years ago
theli dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli 7a35b8e237 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
19 years ago
theli ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
19 years ago
theli 3870d615e3 *) setting htCache.Entry fields to private
19 years ago
theli 393a7d10be *) setting htCache.Entry fields to private
19 years ago
theli ab5a9bee66 *) adding some copyright headers
19 years ago
theli 9ded4e8d5a *) Bugfix for name resolution in proxy mode
19 years ago
theli 09b106eb04 *) next step of restructuring for new crawlers
19 years ago
theli b4acbdaa97 *) better handling of server shutdown
19 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
19 years ago
orbiter 18b6876860 new cache flush configuration settings
19 years ago
orbiter 985dcbde7f changed some parameters that may cause better memory usage and more indexing speed
19 years ago
orbiter b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
19 years ago
orbiter c26da4893b turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
19 years ago
theli f80f776b89 *) Trying to solve NullpointerException problem in function addURLtoErrorDB
19 years ago
orbiter 1ce3c22761 better memory control:
19 years ago
orbiter 39b4c26bdc more memory control:
19 years ago
orbiter eb633c0a4f server threads must now supply a method that can be called in case
19 years ago
orbiter 8418af141a added several consistency checks and small changes
19 years ago
theli eee44be602 *) adding an interface for customized blacklist classes
19 years ago
theli d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
19 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 314021453f * more logging
19 years ago
orbiter 80b6c90d54 enhancements to prevent blocking during dht transfer receive
19 years ago
theli 9f298083cd *) adding more urls to the error url
19 years ago
orbiter 279b1d969d Integrated new indexing data structure 'collections' into the main class
19 years ago
orbiter ebc2233092 * implemented (finished) class indexRowSetContainer
19 years ago
orbiter 9183d21f25 renamed new index class to old name
19 years ago
orbiter c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
19 years ago
orbiter e357599f92 * fixed problem with indexContainer iteration from RAM:
19 years ago
orbiter 5f72be2a95 some redesign of EURL storage
19 years ago
orbiter e4f1820b58 protection against too long authentication strings in switchboard
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter 671fd9a5c9 work towards new indexing database structure
19 years ago
orbiter 92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
19 years ago
orbiter 66964dc015 removed high/med/low from kelondroRecords cache control.
19 years ago
allo 67a8c74be3 Fix for dynamic login with static password.
19 years ago
allo ef9eb50c3c fix for adminlogin
19 years ago
allo 6fe2fed87e cookieauth works with static Admin.
19 years ago
theli 4ca0857c0c *) Index transfer now considers the pause time send by busy peers during
19 years ago
orbiter c75cacda95 added a flex-width-array: this is a table where it is
19 years ago
orbiter 5041d330ce refactoring
19 years ago
orbiter bd057b44dd - automatic setting of peer-does-not-accept-remote-crawl
19 years ago
orbiter cda087f43b - integrated cache miss storage into object cache
19 years ago
theli 61078b3885 *) adding support for delayed shutdown
19 years ago
orbiter 90d569d70f refactoring of index management:
19 years ago
orbiter a930be4ba3 refactoring of index management:
19 years ago
hermens df7e1d9df3 Changes to plasmaURL and subclasses:
19 years ago
orbiter a474669338 start with refactoring of index management
19 years ago
theli f331def5d8 *) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
19 years ago
theli bcc950c533 *) Bugfix for Index Transfer
19 years ago
orbiter 461548698c configuration of index transfer chunk size
19 years ago
hermens 51e3bb576f Don't increase dhtTransferIndexCount when the last transferred index was smaller
19 years ago
hermens a0ca4c5fb8 Remove a possible race condition between DHT transfer and deQueue
19 years ago
orbiter 60e5aff9fc some enhancements to the remote crawl trigger
19 years ago
orbiter 14d6e476c9 tried to solve some problems with new picture viewer
19 years ago
orbiter f0833b0328 introduced simple search interface
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter e2e8d0c188 some kind of refactoring of yacysearch:
19 years ago
rramthun 250864406f ...
19 years ago
orbiter 63f39ac7b5 added 3 new crawling steering options:
19 years ago
orbiter 1fc3b34be6 some pre-work (without function yet) to implement:
19 years ago
theli c9e6b5e391 *) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
19 years ago
orbiter 1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
19 years ago
orbiter 063ef4660a bug?
19 years ago
orbiter 3286b1f498 re-organisation of lurl-creation and -stacking
19 years ago
hydrox 8da13088e9 *)removed multiple DHT_Distribution_Threads
19 years ago
orbiter bcd99fe83e introduced a second RAM cache for DHT transfer
19 years ago
orbiter bae3783d38 added a snippet marking
19 years ago
orbiter f0a38873eb * added yacysearch page with better view on search results
19 years ago
theli 759800f543 *) Bugfix for storeHTCache problem
19 years ago
orbiter 1b9b8922d9 * fixed problems with new basic 1-2-3 configuration (now authentication required)
19 years ago
auron_x 8c6f38fe70 *) added Blog to YaCy (atm not reachable through interface) -> Blog.html
19 years ago
orbiter eaffcfefe2 * added more ranking attributes (without function; this will be added later)
19 years ago
orbiter 3703f76866 - fixed re-search bug: after a search with several words, a second search could not
19 years ago
theli fbbbf5f411 *) remote trigger for proxy-crawl
19 years ago
orbiter 1d8ca6e082 serialized dhtChunk deletion with indexing
19 years ago
theli 2336f0f013 *) allow pausing/resuming of crawlJob Threads separately
19 years ago
orbiter 60dac4325e serialized indexing with dht selection
19 years ago
orbiter a840755964 moved parts of index transfer logic back to switchboard
19 years ago
borg-0300 64441b1f78 ADDED: yacy.badwords list to filter the topwords
19 years ago
orbiter 2c4e4ae6a2 further refactoring of dht selection, transfer and flushing
19 years ago
orbiter 73dad68cf1 outsourced thelis DHT flush class into own file
19 years ago
theli 42a5f56723 *) Bugfix for broken dht thread configuration
19 years ago
hydrox e2af2a3f45 *) it's now possible to run more then one indexDistribution-Thread
19 years ago
theli 980e986b64 *) Re enabling short cycle for already removed nurl entries
19 years ago
allo a26574c894 Migration from tagName as key to wordhash(tagName) as key for bookmarkTags.db
19 years ago
orbiter 1e4578aab6 VERY EXPERIMENTAL removal of index ram cache flushing thread.
19 years ago
orbiter d98418390b - introduced rankingProfile Class
19 years ago
theli 6a99304b2b *) Redesign of db import functionality
19 years ago
orbiter fa90c3ca7a - removed some usage of indexEntity
19 years ago
orbiter 03c65742ba changes towards the new index storage scheme:
19 years ago
orbiter b946e28e61 some ranking enhancements
19 years ago
orbiter eabf4a0386 fix for null pointer exception during shut-down
19 years ago
orbiter f14d49fae9 enhancements, bugfixes and additions to word index attribute storage
19 years ago
allo 4d33020f56 Migration to WORK
19 years ago
rramthun 1e5feedf0e Fix for http://www.yacy-forum.de/viewtopic.php?p=15547#15547
19 years ago
orbiter f4ffa9aee5 - implemented more attributes to index entries
19 years ago
orbiter 0371494010 tried to add word position to index
19 years ago
borg-0300 c5b6154136 added CRDistOn = true/false
19 years ago
orbiter e2ff1767b5 fix for last DHT distribution bug-fix
19 years ago
allo 6822dce57b Using Orbiters function for auth
19 years ago
orbiter 2028403670 - consolidated different orderings to kelondroNaturalOrder
19 years ago
orbiter 9086261476 refactoring of base64 encoding:
19 years ago
allo 351fffc129 DATA/WORK for user-created content
19 years ago
allo a81cc9d969 no DATA/DATA to avoid confusion.
19 years ago
allo 9cce3c5709 dates Table for bookmarksdb(needed for del.icio.us api)
19 years ago
allo 4ac0fd328a First Version of the Bookmarksmanager
19 years ago
orbiter 4500506735 fixed some bugs concerning url entry retrieval and intexControl interface
19 years ago
orbiter 83a34b838d * added Object allocation monitor on performanceMemory page
19 years ago
orbiter 0c762daf4b better startup failure handling
19 years ago
orbiter f27f9ecf15 * activated write buffer for databases.
19 years ago
orbiter c59d1b2f5e - Tests with write buffer (new class kelondroBufferedIOChunks, not yet active)
19 years ago
orbiter bb79fb5d91 - changed handling of error cases retrieving urls from database
19 years ago
theli 386d9e45d8 *) Bugfix for code cleanup
19 years ago
rramthun a1061495d4 Fixed some spelling mistakes and added some text which (should) make it easier to understand the options.
19 years ago
orbiter 0cdc58aaea fixed indexing of local domains.
19 years ago
theli e1c2d8ec5f *) Speedup "removed from queue"
19 years ago
orbiter 13fdebc50d added authentication for link deletion in search result
19 years ago
orbiter 37f88b4017 code cleanup
19 years ago
orbiter ec2b39c1ce code cleanup
19 years ago
orbiter 8f1f2daa5e implemented interactive link deletion of search results.
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
orbiter f57e2d67f5 shortened network overview (less columns fit easier on page)
19 years ago
orbiter 85282b1d98 enhanced YBR recognition and search result heuristics
19 years ago
orbiter 0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
19 years ago
orbiter 0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
19 years ago
orbiter 24dc0e0760 implemented cr-file processing and further transmission steps
19 years ago
theli 86a9210264 *) indexing queue slots are now configurable via config file
19 years ago
borg-0300 ebac51df52 restore defaultRemoteProfile
20 years ago
borg-0300 5778428455 move cutUrlText to nxTools,
20 years ago
borg-0300 9158845c3b bugfix for snippet text null bytes
20 years ago
orbiter f763923e0a added missing files for last commit
20 years ago
orbiter 79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
20 years ago
theli 7e0647f692 *) Bugfix for userDB usage during authentication
20 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
20 years ago
theli fb766413d1 *) Changes on httpc dns caching
20 years ago
theli dd24f0252f *) Searchword highlighting for info page
20 years ago
theli b8ceb1ffde *) Adding better https support for crawler
20 years ago
borg-0300 e3179a6394 added getOwnSeedFile()
20 years ago
hydrox cb69047b91 *)cleanup access static methods and fields
20 years ago
allo 92c49b406b adminAuth with userDB and adminAuthenticated (fix for statuspage)
20 years ago
theli ec3af327f7 *) Bugfix for Proxy-Authentication against remote proxy
20 years ago
orbiter 1aa4ba8b62 added post-search filtering of redundant urls (longer than existing cited)
20 years ago
orbiter 4dcbc26ef1 introduction of search profiles; very experimental
20 years ago
theli 9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
20 years ago
theli 02d9af1a70 *) Restructuring and extending of Remote Proxy Support
20 years ago
borg-0300 58b670201d now, changed HTCacheSize needs no restart
20 years ago
theli 40777556c5 *) Connection Tracking
20 years ago
orbiter 6260942590 changed search process: received indexes are now buffered and written to wordIndex after search
20 years ago
orbiter bc56a88cc8 further refactoring of search
20 years ago
orbiter d29dfb0a12 refactoring of search / preparation for better search methods
20 years ago
theli 461374e175 *) Restricting amount of files that yacy is allowed to open during index transfer/distribution
20 years ago
theli c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
20 years ago
orbiter 10d3627c90 changed word cache flush scheduling and removed possible locks
20 years ago
orbiter 839db8869c added high/low priority for index adding
20 years ago
theli 1688be8590 *) plasmaSwitchboard.java
20 years ago
orbiter 77ae30063d refactoring of websearch process
20 years ago
orbiter 4c7918f5b5 added shotdown to crawl stacker (moved from 882)
20 years ago
orbiter c83594528c integrated crawl stacker into thread control
20 years ago
allo f65c939a60 userDB Auth
20 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
20 years ago
orbiter 0c3a20d44f more + changed log for better understanding of outOfMemory bug and others
20 years ago
allo ff1d3d0680 Init of userDB
20 years ago
orbiter 9c4306e41e fixed problem with htcache path
20 years ago
orbiter 1669eaaa1a fixed svn 805
20 years ago
borg-0300 ca82d690a9 changed in SVN 805 one line too much
20 years ago
borg-0300 4bb1f849a0 Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233
20 years ago
orbiter 2c7b490e30 memory-logging
20 years ago
theli 9b7f37fc37 *) Minor changes
20 years ago
orbiter 9e2fc7e5fe load balancing of crawl target domains
20 years ago
orbiter 3fcc95a82c integrated crawl-profiles db in memory-performance monitor
20 years ago
theli fe6a6abc0b *) Adding robots.txt db to Performance Settings for Memory menue
20 years ago
orbiter c6d2f50375 changed order of robots and double-check
20 years ago
orbiter 68d5ff2ef1 added stringbuffer in condenser
20 years ago
orbiter 07f30931ec various configuration options in memory performance
20 years ago
theli b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
20 years ago
orbiter fb52a82008 added new performance page for memory settings
20 years ago
orbiter 416c126815 fix for a profile = null problem and new monitor in crawl queue
20 years ago
theli 7fe8784231 *) URLs pointing to a server having a private ip addess will not be indexed anymore
20 years ago
theli f8ad65eae1 *) First trial implementation of robots.txt support
20 years ago
borg-0300 8cd6a52dd0 Convention
20 years ago
borg-0300 da9c6857fb *) changed a misunderstand, no BUG ;)
20 years ago
theli 578f36ae18 *) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
20 years ago
theli 1219ef99f0 *) Bugfix for NullpointerException in yacyDebugMode Init
20 years ago