Commit Graph

536 Commits (cf0993f516a4751db1e9a13cd79067751866428e)

Author SHA1 Message Date
orbiter adf75bc9fa better logging for invalid file path detection
19 years ago
orbiter 40621a5663 anhancements in ranking preparation and fixed problem with parser/mime recognition
19 years ago
theli c650b112ea *) Bugfix for relative URL Bug in Crawler
19 years ago
theli 4e73035aef *) Bugfix for "too many open files" during index distribution
19 years ago
orbiter f57e2d67f5 shortened network overview (less columns fit easier on page)
19 years ago
orbiter 85282b1d98 enhanced YBR recognition and search result heuristics
19 years ago
orbiter b9cc9029e3 added ybr selection for remote search
19 years ago
orbiter 0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
19 years ago
theli 90d6c6223b *) Adding color codes to network graphic legend
19 years ago
orbiter bfe51c7228 added generation of domain-list
19 years ago
orbiter 0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
19 years ago
theli c2fe3a1670 *) Updating jMimeMagic Ruleset
19 years ago
orbiter 88e3234393 fine-tuning of rci-generation
19 years ago
orbiter a12759c1bf first try to implement a rci-computation from cr-files
19 years ago
orbiter 4a8e8f269e refactoring of cr-processing; new kelondro class to handle the attribute file format
19 years ago
orbiter 24dc0e0760 implemented cr-file processing and further transmission steps
19 years ago
orbiter 9d9a87f445 limited htcache storage length
19 years ago
theli d0dfccdb77 *) Making CrawlStacker pool configurable via GUI and config file
19 years ago
theli 3631cb1f6d *) deleting empty entities during index selection
19 years ago
theli ca26aab9b1 *) More debugging output for migrateWords
19 years ago
theli 9b35ae9027 *) Correcting wrong % values on IndexTransfer_p page
19 years ago
theli e6bf9d90a5 *) Fixing Problems with MalformedURLs during Word Selection
19 years ago
theli 86a9210264 *) indexing queue slots are now configurable via config file
19 years ago
theli 3c11d7b81c *) Bugfix for minimizeUrlDB
19 years ago
orbiter 9913049009 fixed outOfMemory bug caused by loops in kelondroTree during enumeration
19 years ago
theli bbb936b9ea *) Bugfix for not human readable content of PDFs while viewing the URL Content via GUI
19 years ago
theli 445e3a620f *) Avoid rejecting of html content by the crawler when the file extension is not set properly
19 years ago
theli 444a5a9368 *) Bugfix for Entries with null url in GlobalQueue
19 years ago
borg-0300 ebac51df52 restore defaultRemoteProfile
20 years ago
borg-0300 5778428455 move cutUrlText to nxTools,
20 years ago
borg-0300 9158845c3b bugfix for snippet text null bytes
20 years ago
orbiter f763923e0a added missing files for last commit
20 years ago
orbiter 79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
20 years ago
theli 7e0647f692 *) Bugfix for userDB usage during authentication
20 years ago
orbiter 02f8013013 auto-delete of corrupted word files during word-migration
20 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
20 years ago
theli 6f9f8ed8f8 *) Automatic Reset of Stack Crawler DB on startup errors
20 years ago
theli fb766413d1 *) Changes on httpc dns caching
20 years ago
orbiter bc420c62f6 fixed htcache path generation (never change a running system)
20 years ago
theli dd24f0252f *) Searchword highlighting for info page
20 years ago
borg-0300 72cde1d894 getCachePath: no logging
20 years ago
borg-0300 1fbd72f9e0 rename "index.html" to "ndx"
20 years ago
borg-0300 cd1107d85e added support for URLs with '?&'
20 years ago
borg-0300 5fb2b017cb small change
20 years ago
borg-0300 544e4ea90e small change
20 years ago
borg-0300 00ab4d8723 cleaned, small change, Properties
20 years ago
theli b8ceb1ffde *) Adding better https support for crawler
20 years ago
borg-0300 e3179a6394 added getOwnSeedFile()
20 years ago
borg-0300 a803a509ae bugfix: port handling in HTCache
20 years ago
hydrox cb69047b91 *)cleanup access static methods and fields
20 years ago
hydrox 56b9f34411 *)removed unused imports
20 years ago
orbiter 5f68b6886b introduced new url-hashes for better ranking computation
20 years ago
orbiter aadace1285 fixed network image in search performance monitor
20 years ago
orbiter bb369c98de fixed search result ordering by date
20 years ago
orbiter b058ecf0bc refactoring of image-generation; added experimental PNG encoder (not active now)
20 years ago
orbiter d42531e1b2 added auto-reset for NURL-DBs
20 years ago
allo 92c49b406b adminAuth with userDB and adminAuthenticated (fix for statuspage)
20 years ago
rramthun 27f180f24b Update of YaWoStat to 0.2.
20 years ago
orbiter d656e2b433 added a memory-profile chart generation to database performance testing
20 years ago
theli ec3af327f7 *) Bugfix for Proxy-Authentication against remote proxy
20 years ago
orbiter 5b0911d7ea added new performance menu for search sequence configuration and monitoring
20 years ago
allo ada06b0674 bugfix for Networkimage from Hydrox
20 years ago
orbiter 1aa4ba8b62 added post-search filtering of redundant urls (longer than existing cited)
20 years ago
orbiter 8d827cdb30 tried to fix problems with order of network list by last-seen (which could also improve the network picture)
20 years ago
orbiter 097009d910 experimental visualization of DHT access during global search (temporary)
20 years ago
orbiter 4dcbc26ef1 introduction of search profiles; very experimental
20 years ago
theli 6c48c3ce39 *) Bugfix for ArithmeticException during IndexTransfer
20 years ago
theli 525c8dcbd4 *) Adding Traffic Statistic for Crawler
20 years ago
theli 9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
20 years ago
theli 02d9af1a70 *) Restructuring and extending of Remote Proxy Support
20 years ago
borg-0300 58b670201d now, changed HTCacheSize needs no restart
20 years ago
theli 40777556c5 *) Connection Tracking
20 years ago
rramthun a98bafb939 Changes to german language file
20 years ago
theli 95abdeb685 *) Bugfix for nextElement function of URL Enumerator
20 years ago
orbiter 6260942590 changed search process: received indexes are now buffered and written to wordIndex after search
20 years ago
borg-0300 7ee03acce0 new function cutUrlText added to shortens the URLs on IndexMonitor.html
20 years ago
orbiter bc56a88cc8 further refactoring of search
20 years ago
orbiter d29dfb0a12 refactoring of search / preparation for better search methods
20 years ago
theli 0ae166c522 *) Small changes to Index Transfer.
20 years ago
theli 461374e175 *) Restricting amount of files that yacy is allowed to open during index transfer/distribution
20 years ago
theli c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
20 years ago
orbiter b80b2fbdcc crawling peers now produce waves in network graphic
20 years ago
orbiter 10d3627c90 changed word cache flush scheduling and removed possible locks
20 years ago
orbiter 839db8869c added high/low priority for index adding
20 years ago
theli 1688be8590 *) plasmaSwitchboard.java
20 years ago
orbiter e9eb5e4b56 refactoring of index-entity join methods
20 years ago
orbiter 258fd9eb8e adding missing file for websearch refactoring
20 years ago
orbiter 77ae30063d refactoring of websearch process
20 years ago
orbiter 579b22d8ff small update to network drawing
20 years ago
orbiter 2b5829c3da small fix
20 years ago
orbiter 4c7918f5b5 added shotdown to crawl stacker (moved from 882)
20 years ago
orbiter 2851658c2a re-integrated Martins last change to crawl stacker from svn 882 that I had deleted accidently
20 years ago
orbiter c83594528c integrated crawl stacker into thread control
20 years ago
theli 959eefbc4f *) Robots.txt parser/ppt
20 years ago
allo f65c939a60 userDB Auth
20 years ago
orbiter 1a5d98cd6d better imagePainter example and fix for typo http://www.yacy-forum.de/viewtopic.php?p=10920#10920
20 years ago
orbiter f6cf3967de fix for compile-bug in svn 583 (Martin guck mal ob das richtig ist: fifo oder filo-stack?)
20 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
20 years ago
orbiter 6d5d0ac801 bugfix for startup problems
20 years ago
orbiter 0c3a20d44f more + changed log for better understanding of outOfMemory bug and others
20 years ago
theli 0fd9aa6c6e *) Bugfix: supportedFileExt Function didn't detect the file extension correctly because of missing conversion to lower case
20 years ago
theli 8a33c9b309 *) Bugfix: supportedFileExt Function didn't detect the file extension correctly if there was a dot
20 years ago
theli 28c5687ff9 *) Bugfix for "download of non supported file content" via crawler
20 years ago
theli 2b3f964037 *) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension
20 years ago
allo ff1d3d0680 Init of userDB
20 years ago
orbiter 9c4306e41e fixed problem with htcache path
20 years ago
orbiter 1669eaaa1a fixed svn 805
20 years ago
borg-0300 ca82d690a9 changed in SVN 805 one line too much
20 years ago
borg-0300 4bb1f849a0 Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233
20 years ago
orbiter 2c7b490e30 memory-logging
20 years ago
orbiter 7fc822a59b changed handling of time-zones
20 years ago
theli 9b7f37fc37 *) Minor changes
20 years ago
theli b5a8992d29 *) Setting some object fields to final
20 years ago
theli 023be89586 *) Bugfix for "Robots.txt wird immer wieder geladen"
20 years ago
theli 35c6c5ead7 *) Bugfix for "Blacklist und Crawlen" Bug.
20 years ago
orbiter 9e2fc7e5fe load balancing of crawl target domains
20 years ago
orbiter 3fcc95a82c integrated crawl-profiles db in memory-performance monitor
20 years ago
theli fe6a6abc0b *) Adding robots.txt db to Performance Settings for Memory menue
20 years ago
orbiter 3274ae725e increased cache size of robots database; however, this should be integrated into new memory control
20 years ago
orbiter c6d2f50375 changed order of robots and double-check
20 years ago
orbiter 68d5ff2ef1 added stringbuffer in condenser
20 years ago
orbiter 495bc8bec6 removed cache-control from low and medium priority caches which reduces memory use and computation overhead
20 years ago
orbiter 18d9e1a256 fix for http://www.yacy-forum.de/viewtopic.php?p=10026#10026
20 years ago
orbiter 07f30931ec various configuration options in memory performance
20 years ago
theli b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
20 years ago
borg-0300 6d1de8abfd finals; cleaned;
20 years ago
orbiter 14bc880fa4 fixed bug with crashed profile database
20 years ago
orbiter 71a31f0902 integrated and extended new memory performance menu; found and fixed bug in DHT caching
20 years ago
orbiter fb52a82008 added new performance page for memory settings
20 years ago
orbiter cddd9aaa33 fixed SERIOUS bug with kelondroStack; affected all stack processing since 729
20 years ago
orbiter 416c126815 fix for a profile = null problem and new monitor in crawl queue
20 years ago
orbiter 2148c0cf49 replaced kelondro storage core; much less objects in kelondro cache now; less IO from DB
20 years ago
theli beefddf0e8 *) Adding option which allows to do a Index-Transfer without deletion of index
20 years ago
rramthun 4036ee812a Updated german language file
20 years ago
theli 40925f4fb7 *) Improving complete index transfer performance by automatically increasing size of transfered word chunk
20 years ago
theli 91ab4d044b *) Adding automatic retry functionality to complete index transfer function
20 years ago
theli a62677f761 *) Adding additional logging output for complete index transfer
20 years ago
theli b991d2e7dd *) Additional logging message for complete index transfer
20 years ago
theli 3c00c5f6c7 *) Complete Index Transfer
20 years ago
theli 2cb084d426 *) Complete Index Transfer
20 years ago
theli d1de71e9f6 *) Suppress stacktrace on proxy error for "No route to host Exception"
20 years ago
theli 56160cbd01 *) Bugfix for "YaCy verzählt sich ..." Bug.
20 years ago
orbiter 43b42854a0 fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
20 years ago
theli 3587407039 *) Fixing problems of list operation if index and queue size are both 0.
20 years ago
theli 51b48a10e8 *) Suppress stacktrace on proxy error for "ValidatorException: No trusted certificate found"
20 years ago
theli 7fe8784231 *) URLs pointing to a server having a private ip addess will not be indexed anymore
20 years ago
theli 0aafb83edc *) Bugfix for robots.txt isDisallowed Check.
20 years ago
borg-0300 8260128ee9 changed getFreeSize();
20 years ago
theli f8ad65eae1 *) First trial implementation of robots.txt support
20 years ago
borg-0300 0a57fbcde5 Added new HashSet filesInUse;
20 years ago
borg-0300 8cd6a52dd0 Convention
20 years ago
borg-0300 c0e3d18bbf *) remove import java.lang
20 years ago
borg-0300 b1cd1fa917 cleaned
20 years ago
borg-0300 da9c6857fb *) changed a misunderstand, no BUG ;)
20 years ago
borg-0300 fbac053c03 small change
20 years ago
theli 578f36ae18 *) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
20 years ago
theli 1219ef99f0 *) Bugfix for NullpointerException in yacyDebugMode Init
20 years ago
theli 6c722706b7 *) Moving yacyDebugMode intialization to switchboard
20 years ago
theli 4e07828807 *) httpdProxyHandler.java
20 years ago
borg-0300 81cb8feb15 back to 649 :/
20 years ago
borg-0300 5194511e8e *) attempt to find bug
20 years ago
theli 6991b9e2b9 *) Suppress stacktrace on crawler error for "Connection reset"
20 years ago
theli a47f9238fe *) Blacklist is now also used by the crawler
20 years ago
theli dc0a2d4c11 *) Bugfix for Loader Queue:
20 years ago
theli 732a107160 *) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
20 years ago
theli 33aaffbfc6 *) Displaying content size of each entry in indexing queue
20 years ago
borg-0300 7626823519 BUGFIX for last 'commit'
20 years ago
borg-0300 971756e8dd the delete size is smaller
20 years ago
theli 0471019606 *) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
20 years ago
borg-0300 cc493ef8c1 Added change from Hermes
20 years ago
theli bead8a32aa *) IndexCreate_p.java:
20 years ago
theli 48aaf703cc *) Adding additional logging output to detect crawling problems
20 years ago
theli 59b8a98c7e *) Bugfix for suppressing of stacktrace in log on crawler error "MalformedURLException"
20 years ago
borg-0300 c1d7527929 better cache cleanup
20 years ago
theli 2e6df95786 *) adding toString method
20 years ago
theli 4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
20 years ago
theli 6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
20 years ago
theli f19c09b227 *) Suppress stacktrace on crawler error for "MalformedURLException"
20 years ago
theli cc1df08069 *) Adding missing synchronized blocks
20 years ago
borg-0300 bf14e6def5 *) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
20 years ago
theli 9b818b1ce3 *) Pausing Crawlers if there is not enough space on disk
20 years ago
theli b33094e925 *) Trying to solve "Too many open files bug"
20 years ago
theli 34790acf02 *) Bugfix for suppressing of stacktrace in log on crawler error "unknown host"
20 years ago
theli af7b8f75bd *) Making proxyAccessLogging configureable via yacy.logging file
20 years ago
theli 2a081c9ee5 *) Adding additional logging message for "NURL.entry() == null" Bug
20 years ago
theli cb1f11c96b *) Suppress stacktrace on crawler error for "Unknown Host"
20 years ago
theli e338a13de3 *) Suppress stacktrace on crawler error for "Read timed out"
20 years ago
theli 2e43e744de *) Suppress stacktrace on crawler error for "connect timed out"
20 years ago
theli 36cbe04e3e *) Bugfix for Crawler Redirection Bug
20 years ago
theli b70de495a0 *) Remembering Crawler-isPaused setting
20 years ago
theli e569a84dc0 *) Using the same configuration settings for all indexing threads on server Startup
20 years ago
theli 17be77a468 *) Bugfix for "Crawler data will not be removed from htcache if content parsing failed"
20 years ago
theli 5f55dff297 *) Bugfix for "Binäre Nullen auf der page: Index Creation: Indexing Queue"
20 years ago
allo eb6365c069 local Bootstrapping bug.
20 years ago
theli 330eae7cf3 *) Normalizing CrawlerStartURL now before crawling is started
20 years ago
theli ab894d26bc *) Bugfix for "plasmaSwitchboard.deQueue: null" Bug (hopefully)
20 years ago
theli eaf9f26cc3 *) Bugfix for NULL PROFILE HANDLE 'null' Bug:
20 years ago
rramthun 4cb382decb Adding changes by borg-0300 from http://www.yacy-forum.de/viewtopic.php?t=997
20 years ago
theli ec4c70d722 *) If there are at most 10 entries left while doing an index transfer, these entries will also be appended
20 years ago
theli d4a045d7b1 *) Trying to solve "de.anomic.plasma.plasmaSwitchboard.deQueue': null" Bug
20 years ago