Commit Graph

510 Commits (3f2b18a4e756897031e3ae1b6316744f2f292a5b)

Author SHA1 Message Date
theli 4ca0857c0c *) Index transfer now considers the pause time send by busy peers during
19 years ago
orbiter c75cacda95 added a flex-width-array: this is a table where it is
19 years ago
orbiter 5041d330ce refactoring
19 years ago
orbiter bd057b44dd - automatic setting of peer-does-not-accept-remote-crawl
19 years ago
orbiter cda087f43b - integrated cache miss storage into object cache
19 years ago
theli 61078b3885 *) adding support for delayed shutdown
19 years ago
orbiter 90d569d70f refactoring of index management:
19 years ago
orbiter a930be4ba3 refactoring of index management:
19 years ago
hermens df7e1d9df3 Changes to plasmaURL and subclasses:
19 years ago
orbiter a474669338 start with refactoring of index management
19 years ago
theli f331def5d8 *) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
19 years ago
theli bcc950c533 *) Bugfix for Index Transfer
19 years ago
orbiter 461548698c configuration of index transfer chunk size
19 years ago
hermens 51e3bb576f Don't increase dhtTransferIndexCount when the last transferred index was smaller
19 years ago
hermens a0ca4c5fb8 Remove a possible race condition between DHT transfer and deQueue
19 years ago
orbiter 60e5aff9fc some enhancements to the remote crawl trigger
19 years ago
orbiter 14d6e476c9 tried to solve some problems with new picture viewer
19 years ago
orbiter f0833b0328 introduced simple search interface
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter e2e8d0c188 some kind of refactoring of yacysearch:
19 years ago
rramthun 250864406f ...
19 years ago
orbiter 63f39ac7b5 added 3 new crawling steering options:
19 years ago
orbiter 1fc3b34be6 some pre-work (without function yet) to implement:
19 years ago
theli c9e6b5e391 *) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
19 years ago
orbiter 1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
19 years ago
orbiter 063ef4660a bug?
19 years ago
orbiter 3286b1f498 re-organisation of lurl-creation and -stacking
19 years ago
hydrox 8da13088e9 *)removed multiple DHT_Distribution_Threads
19 years ago
orbiter bcd99fe83e introduced a second RAM cache for DHT transfer
19 years ago
orbiter bae3783d38 added a snippet marking
19 years ago
orbiter f0a38873eb * added yacysearch page with better view on search results
19 years ago
theli 759800f543 *) Bugfix for storeHTCache problem
19 years ago
orbiter 1b9b8922d9 * fixed problems with new basic 1-2-3 configuration (now authentication required)
19 years ago
auron_x 8c6f38fe70 *) added Blog to YaCy (atm not reachable through interface) -> Blog.html
19 years ago
orbiter eaffcfefe2 * added more ranking attributes (without function; this will be added later)
19 years ago
orbiter 3703f76866 - fixed re-search bug: after a search with several words, a second search could not
19 years ago
theli fbbbf5f411 *) remote trigger for proxy-crawl
19 years ago
orbiter 1d8ca6e082 serialized dhtChunk deletion with indexing
19 years ago
theli 2336f0f013 *) allow pausing/resuming of crawlJob Threads separately
19 years ago
orbiter 60dac4325e serialized indexing with dht selection
19 years ago
orbiter a840755964 moved parts of index transfer logic back to switchboard
19 years ago
borg-0300 64441b1f78 ADDED: yacy.badwords list to filter the topwords
19 years ago
orbiter 2c4e4ae6a2 further refactoring of dht selection, transfer and flushing
19 years ago
orbiter 73dad68cf1 outsourced thelis DHT flush class into own file
19 years ago
theli 42a5f56723 *) Bugfix for broken dht thread configuration
19 years ago
hydrox e2af2a3f45 *) it's now possible to run more then one indexDistribution-Thread
19 years ago
theli 980e986b64 *) Re enabling short cycle for already removed nurl entries
19 years ago
allo a26574c894 Migration from tagName as key to wordhash(tagName) as key for bookmarkTags.db
19 years ago
orbiter 1e4578aab6 VERY EXPERIMENTAL removal of index ram cache flushing thread.
19 years ago
orbiter d98418390b - introduced rankingProfile Class
19 years ago
theli 6a99304b2b *) Redesign of db import functionality
19 years ago
orbiter fa90c3ca7a - removed some usage of indexEntity
19 years ago
orbiter 03c65742ba changes towards the new index storage scheme:
19 years ago
orbiter b946e28e61 some ranking enhancements
19 years ago
orbiter eabf4a0386 fix for null pointer exception during shut-down
19 years ago
orbiter f14d49fae9 enhancements, bugfixes and additions to word index attribute storage
19 years ago
allo 4d33020f56 Migration to WORK
19 years ago
rramthun 1e5feedf0e Fix for http://www.yacy-forum.de/viewtopic.php?p=15547#15547
19 years ago
orbiter f4ffa9aee5 - implemented more attributes to index entries
19 years ago
orbiter 0371494010 tried to add word position to index
19 years ago
borg-0300 c5b6154136 added CRDistOn = true/false
19 years ago
orbiter e2ff1767b5 fix for last DHT distribution bug-fix
19 years ago
allo 6822dce57b Using Orbiters function for auth
19 years ago
orbiter 2028403670 - consolidated different orderings to kelondroNaturalOrder
19 years ago
orbiter 9086261476 refactoring of base64 encoding:
19 years ago
allo 351fffc129 DATA/WORK for user-created content
19 years ago
allo a81cc9d969 no DATA/DATA to avoid confusion.
19 years ago
allo 9cce3c5709 dates Table for bookmarksdb(needed for del.icio.us api)
19 years ago
allo 4ac0fd328a First Version of the Bookmarksmanager
19 years ago
orbiter 4500506735 fixed some bugs concerning url entry retrieval and intexControl interface
19 years ago
orbiter 83a34b838d * added Object allocation monitor on performanceMemory page
19 years ago
orbiter 0c762daf4b better startup failure handling
19 years ago
orbiter f27f9ecf15 * activated write buffer for databases.
19 years ago
orbiter c59d1b2f5e - Tests with write buffer (new class kelondroBufferedIOChunks, not yet active)
19 years ago
orbiter bb79fb5d91 - changed handling of error cases retrieving urls from database
19 years ago
theli 386d9e45d8 *) Bugfix for code cleanup
19 years ago
rramthun a1061495d4 Fixed some spelling mistakes and added some text which (should) make it easier to understand the options.
19 years ago
orbiter 0cdc58aaea fixed indexing of local domains.
19 years ago
theli e1c2d8ec5f *) Speedup "removed from queue"
19 years ago
orbiter 13fdebc50d added authentication for link deletion in search result
19 years ago
orbiter 37f88b4017 code cleanup
19 years ago
orbiter ec2b39c1ce code cleanup
19 years ago
orbiter 8f1f2daa5e implemented interactive link deletion of search results.
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
orbiter f57e2d67f5 shortened network overview (less columns fit easier on page)
19 years ago
orbiter 85282b1d98 enhanced YBR recognition and search result heuristics
19 years ago
orbiter 0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
19 years ago
orbiter 0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
19 years ago
orbiter 24dc0e0760 implemented cr-file processing and further transmission steps
19 years ago
theli 86a9210264 *) indexing queue slots are now configurable via config file
19 years ago
borg-0300 ebac51df52 restore defaultRemoteProfile
20 years ago
borg-0300 5778428455 move cutUrlText to nxTools,
20 years ago
borg-0300 9158845c3b bugfix for snippet text null bytes
20 years ago
orbiter f763923e0a added missing files for last commit
20 years ago
orbiter 79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
20 years ago
theli 7e0647f692 *) Bugfix for userDB usage during authentication
20 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
20 years ago
theli fb766413d1 *) Changes on httpc dns caching
20 years ago
theli dd24f0252f *) Searchword highlighting for info page
20 years ago
theli b8ceb1ffde *) Adding better https support for crawler
20 years ago
borg-0300 e3179a6394 added getOwnSeedFile()
20 years ago
hydrox cb69047b91 *)cleanup access static methods and fields
20 years ago
allo 92c49b406b adminAuth with userDB and adminAuthenticated (fix for statuspage)
20 years ago
theli ec3af327f7 *) Bugfix for Proxy-Authentication against remote proxy
20 years ago
orbiter 1aa4ba8b62 added post-search filtering of redundant urls (longer than existing cited)
20 years ago
orbiter 4dcbc26ef1 introduction of search profiles; very experimental
20 years ago
theli 9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
20 years ago
theli 02d9af1a70 *) Restructuring and extending of Remote Proxy Support
20 years ago
borg-0300 58b670201d now, changed HTCacheSize needs no restart
20 years ago
theli 40777556c5 *) Connection Tracking
20 years ago
orbiter 6260942590 changed search process: received indexes are now buffered and written to wordIndex after search
20 years ago
orbiter bc56a88cc8 further refactoring of search
20 years ago
orbiter d29dfb0a12 refactoring of search / preparation for better search methods
20 years ago
theli 461374e175 *) Restricting amount of files that yacy is allowed to open during index transfer/distribution
20 years ago
theli c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
20 years ago
orbiter 10d3627c90 changed word cache flush scheduling and removed possible locks
20 years ago
orbiter 839db8869c added high/low priority for index adding
20 years ago
theli 1688be8590 *) plasmaSwitchboard.java
20 years ago
orbiter 77ae30063d refactoring of websearch process
20 years ago
orbiter 4c7918f5b5 added shotdown to crawl stacker (moved from 882)
20 years ago
orbiter c83594528c integrated crawl stacker into thread control
20 years ago
allo f65c939a60 userDB Auth
20 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
20 years ago
orbiter 0c3a20d44f more + changed log for better understanding of outOfMemory bug and others
20 years ago
allo ff1d3d0680 Init of userDB
20 years ago
orbiter 9c4306e41e fixed problem with htcache path
20 years ago
orbiter 1669eaaa1a fixed svn 805
20 years ago
borg-0300 ca82d690a9 changed in SVN 805 one line too much
20 years ago
borg-0300 4bb1f849a0 Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233
20 years ago
orbiter 2c7b490e30 memory-logging
20 years ago
theli 9b7f37fc37 *) Minor changes
20 years ago
orbiter 9e2fc7e5fe load balancing of crawl target domains
20 years ago
orbiter 3fcc95a82c integrated crawl-profiles db in memory-performance monitor
20 years ago
theli fe6a6abc0b *) Adding robots.txt db to Performance Settings for Memory menue
20 years ago
orbiter c6d2f50375 changed order of robots and double-check
20 years ago
orbiter 68d5ff2ef1 added stringbuffer in condenser
20 years ago
orbiter 07f30931ec various configuration options in memory performance
20 years ago
theli b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
20 years ago
orbiter fb52a82008 added new performance page for memory settings
20 years ago
orbiter 416c126815 fix for a profile = null problem and new monitor in crawl queue
20 years ago
theli 7fe8784231 *) URLs pointing to a server having a private ip addess will not be indexed anymore
20 years ago
theli f8ad65eae1 *) First trial implementation of robots.txt support
20 years ago
borg-0300 8cd6a52dd0 Convention
20 years ago
borg-0300 da9c6857fb *) changed a misunderstand, no BUG ;)
20 years ago
theli 578f36ae18 *) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
20 years ago
theli 1219ef99f0 *) Bugfix for NullpointerException in yacyDebugMode Init
20 years ago
theli 6c722706b7 *) Moving yacyDebugMode intialization to switchboard
20 years ago
theli 4e07828807 *) httpdProxyHandler.java
20 years ago
theli a47f9238fe *) Blacklist is now also used by the crawler
20 years ago
theli 732a107160 *) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
20 years ago
theli 0471019606 *) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
20 years ago
theli 48aaf703cc *) Adding additional logging output to detect crawling problems
20 years ago
theli 4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
20 years ago
theli 6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
20 years ago
borg-0300 bf14e6def5 *) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
20 years ago
theli 2a081c9ee5 *) Adding additional logging message for "NURL.entry() == null" Bug
20 years ago
theli b70de495a0 *) Remembering Crawler-isPaused setting
20 years ago
theli e569a84dc0 *) Using the same configuration settings for all indexing threads on server Startup
20 years ago
theli 17be77a468 *) Bugfix for "Crawler data will not be removed from htcache if content parsing failed"
20 years ago
allo eb6365c069 local Bootstrapping bug.
20 years ago
theli 330eae7cf3 *) Normalizing CrawlerStartURL now before crawling is started
20 years ago
theli d4a045d7b1 *) Trying to solve "de.anomic.plasma.plasmaSwitchboard.deQueue': null" Bug
20 years ago
orbiter 25f632dbd9 more DHT bugfixes and better logging of DHT effects
20 years ago
orbiter 5cb00889d9 enhancements to dht selection, search and search presentation
20 years ago
orbiter ba0a486328 moved printStackTrace() to logging
20 years ago
orbiter cd10370992 several bugfixes and dht selection / logging improvement
20 years ago
orbiter c8a7a85ce2 fix for http://www.yacy-forum.de/viewtopic.php?p=7384#7384
20 years ago
orbiter 7db543a9fa fixes for several dht misbehaviours
20 years ago
orbiter 5716f8521d bug fixes for word ordering and dht index selection
20 years ago
orbiter f5259f29e8 word cache behaviour fix and other fixes
20 years ago
orbiter 2c234e1b82 better log output for search result
20 years ago
orbiter 248c24b60a intermission-feature usage in case of local and remote search
20 years ago
theli 865b9490a2 *) Making DHT Transfer while Crawling configurable
20 years ago
orbiter 2d8557cb10 minor changes
20 years ago
orbiter 91163db52e fix for more time-related problems in proxy
20 years ago
orbiter 40da910f41 bugfixes and automatic news-cleanup
20 years ago
theli 228b04b499 *) Bugfix for "wrong seed-upload timestamp" problem
20 years ago
theli 470839a16a *) Crawler/Session pool settings will now be stored properly into configfile
20 years ago
orbiter 1022fbeb65 many YaCyNews fixes
20 years ago
orbiter 13abd8b6e7 added news-creation at crawl start
20 years ago
orbiter cdbbfd50fb fixed bad remote crawl behavior
20 years ago
orbiter 81e564edb8 faster crawl profile list cleanup
20 years ago
orbiter ad90f0ad13 activated RWI distribution to DHT for senior peers (default redundancy 3), necessary now for network growth
20 years ago
orbiter b9d18d40cb configuration of proxy idle time in performance menue
20 years ago
orbiter c64970fa47 re-implemented proxy-busy-check and fixed some other things
20 years ago
orbiter b73557ed2d better assortment monitoring and enhanced profile menue
20 years ago
orbiter 9f505af7aa preparations for bulk remote crawls
20 years ago
orbiter 51962d55bf added 'PPM', page-per-minute statistics
20 years ago
orbiter 159f795f65 bugfix (null pointer exception in assortments)
20 years ago
orbiter 1d2155675b changed assortment memory cache flush
20 years ago
orbiter 19dbed7cc8 code clean-up
20 years ago
orbiter 40036ba69c fixed dht transmission; added url-blacklist blocking also for remote search
20 years ago
orbiter 311e627363 blocking of blacklisted urls in indexReceive and small changes
20 years ago
orbiter 277048501e bugfix
20 years ago
orbiter 8b89c46afe fixed problem with cache write
20 years ago
orbiter 419f8fb398 fixed bugs/missing code regarding new crawl stack
20 years ago
orbiter 858cd94299 replaced indexing ram-queue by file-based stack-queue
20 years ago
theli 0e2c33ee55 *) Network.html/Network.java:
20 years ago
orbiter eb74fa0c82 fixed a bug with snippet-length
20 years ago
orbiter 86f2aa8478 fixed seed-load date bug (evaluating server date for age computation)
20 years ago
orbiter 75ebdbc852 enhanced snippet-generation (case where snippet is too long)
20 years ago
orbiter 8a4f297324 fixed/enhanced snippet error-handling; suppression of results where no snippet exists
20 years ago
orbiter 712fe9ef18 bugfixed utf-8 decoding and parser
20 years ago
orbiter 3addf58046 enhanced snippet-loading with threads
20 years ago
orbiter 56d28a16f0 bugfixes
20 years ago
orbiter d6c85228a6 enhanced snippet computation
20 years ago
theli aae9a433a6 *) correcting usage of supportedFileExt-List
20 years ago
orbiter 1e7f062350 many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
20 years ago
orbiter 68dc2b0c6b added kelondroArray, the basis for upcoming kelondroHash and some bug fixes
20 years ago
orbiter a19541e563 code-enhancements after analysis with AppPerfect
20 years ago
orbiter 85075269a6 extended fail-safe memory-managament. prevents too much allocation, too often GC and should help for the 100%CPU-bug
20 years ago
orbiter e3c92818db avoiding OutOfMemoryError routines
20 years ago
orbiter 3e8ee5a46d enhanced caching in kelondroRecords and added better synchronization/finalizer
20 years ago
orbiter 5d06ded005 enhanced html parser speed
20 years ago
orbiter 5a490aa065 fixed html parser
20 years ago
orbiter a25b5b4986 fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
20 years ago
theli 9e47ba5ad6 *) adding missing calls for function close() to avoid "too many open file" bug
20 years ago
orbiter a1ffc27041 preparations for image/movie/music indexing
20 years ago
orbiter a5b40923b6 added word migration to assortments (start with 'java -classpath classes yacy -migratewords')
20 years ago
theli ee9e110366 *) removing old logging configuration properties from yacy.init
20 years ago
theli c1a4e0dc28 *) changing reference to logger
20 years ago
orbiter 4574fa4ce7 bugfixes
20 years ago
orbiter 33f9315e58 implemented multithreading of indexing
20 years ago
orbiter ca3b4ccaf4 added snippet-routines (not yet finished)
20 years ago
orbiter 594c591223 changes towards 0.38
20 years ago
orbiter d8fdc2526e added experimental snipplet-generation (to be disabled for 0.38)
20 years ago
orbiter 3771b10b89 implemented automated migration indexCache 0.37 -> indexAssortmentCluster
20 years ago
orbiter e89ded9e41 bugfixes
20 years ago
orbiter 3d8a2ff937 enhanced parallelization of local/global/remote crawling
20 years ago
orbiter 21110dcd5e fixed bugs with open files and caching
20 years ago
theli 74eb21f62e *) adding image tag into rss template
20 years ago
orbiter 5c6147a54c introduced assortment structure (generalization of singletons)
20 years ago
theli 73e297f30f *) adding proper default values for RealtimeParsableMimeTypes if something goes wrong with the configuration file
20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility,
20 years ago
theli ddc5675781 *) Correcting typo
20 years ago
theli d2c4e9a55e *) Implementing yacy forum wishlist item: "Pause Crawling"
20 years ago
orbiter b4030e5023 implemented serverSwitchActions - action-hooks
20 years ago
orbiter 1d7fed87dc redesign of index caching - removed indexCache.db
20 years ago
rramthun 3f85978519 Fixed one spelling mistake, limited input for ICQ numbers to 9 digits and made ICQ number in peer profiles clickable.
20 years ago
theli 2aa5fe8f50 *) Import statements reorganized
20 years ago
orbiter 48650c082c fixed 100%-CPU-Bug in plasmaCondenser
20 years ago
orbiter 995673d795 several bugfixes
20 years ago
orbiter 2de90020ed fixed caching+synchronization+brute-force-denial
20 years ago
orbiter 9156fd53bc fixed bugs in last commit
20 years ago
orbiter e25f2354c2 removed synchronization and thread blockings
20 years ago
theli 58a65b60bd *) synchronized keyword removed from function processLocalCrawling to avoid deadlocks.
20 years ago
theli 65fc650109 *) plasmaCrawlLoader shutdown problem fixed (hopefully)
20 years ago
orbiter ba16da72b4 fixed not-working kelondroRecords-Cache
20 years ago
orbiter 7fb645b0ab enhanced crawling performance, changed memory settings, new performace options
20 years ago
theli 58b1a0ba40 *) adding an new package for extra content parsers
20 years ago
orbiter 8b31f9e202 enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
20 years ago
orbiter 00f223cfc1 fixed post-parsing (a case when the bluelist is empty)
20 years ago
orbiter 97ec8d65e4 fixed makerelease & clean-up of dead code
20 years ago
orbiter b9203bdb50 bug fixes and code cleaning
20 years ago
orbiter c0807abd33 new crawl/proxy/cache design + fixes
20 years ago
orbiter e7d055b98e very experimental integration of the new generic parser and optional disabling of bluelist filtering in proxy. Does not yet work properly. To disable the disable-feature, the presence of a non-empty bluelist is necessary
20 years ago
orbiter a87a17a3c8 prepared generic text parser environment
20 years ago
orbiter 89eb9a2292 fixed bug with crawl profiles
20 years ago
orbiter 248077d3f0 initial load with yacy 0.36
20 years ago