Commit Graph

4782 Commits (8cf47a83350bb000cf6b576c697ba2255c0d5df6)

Author SHA1 Message Date
orbiter 35a9e8f307 - fixed network graphic
13 years ago
Al Sutton 8993cac4d8 Initial performance improvements
13 years ago
orbiter 8895d8c1cd removed unnecessary log entries
13 years ago
apfelmaennchen 77a080ced9 smaller fixes for YMarks
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
apfelmaennchen dd1482aaf5 further update to YMarks
13 years ago
orbiter c584db991f creating a bookmark from the search results now works again .. with new YMarks
13 years ago
apfelmaennchen 564374d1fe - included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
13 years ago
orbiter c93f10417a add a bookmark automatically each time a new crawl is started
13 years ago
orbiter e4a82ddd8b produce a bookmark entry from every crawl start. these bookmarks are always private.
13 years ago
apfelmaennchen 6287c2b4a9 YMarks:
13 years ago
cominch 2236e01137 Minor correction to prevent useless comma at beginning of string, created from list
13 years ago
apfelmaennchen 5581be12fb YMarks:
13 years ago
apfelmaennchen a3eebfdcba YMarks:
13 years ago
orbiter c50f8f9a06 code cleanup
13 years ago
apfelmaennchen 4f95f72124 YMarks:
13 years ago
orbiter aa322bc6d0 fix
13 years ago
orbiter 97d1347adb added also a default accept field to robots.txt downloads
13 years ago
orbiter f183d3822c added a default accept header in http requests since some http fraud detection functions check that this header field exist
13 years ago
orbiter 06352b8d6b more logging
13 years ago
orbiter a99934226e more logging for debugging of robots.txt
13 years ago
orbiter 7a5841e061 fix for robot parser
13 years ago
orbiter 458c20ff72 fix for robot parser
13 years ago
orbiter 017a01714d - enhanced logging in robots.txt parser for remote debugging
13 years ago
apfelmaennchen a8dfe787ed - updated to jquery flexigrid 1.1
13 years ago
orbiter eb1c7c041d write info about robots.txt evaluation into getpageinfo_p.xml
13 years ago
apfelmaennchen abba31f02e - bugfix for correctly sorting ymarks
13 years ago
orbiter 775b44017e refactoring
13 years ago
apfelmaennchen 5f7dbe1c42 - some refactoring (ymarks)
13 years ago
orbiter 78ce3b13be typo
13 years ago
orbiter 85d6bf4ac4 fixed urls to media content during indexing
13 years ago
orbiter 0d858d48ec replaced String with StringBuilder in suggestion process
13 years ago
orbiter 3a807e10cf - added a cache for active crawl profiles to the crawl switchboard
13 years ago
orbiter 37e35f2741 normalization of url using urlencoding/decoding
13 years ago
orbiter 1b86d06d1e fix for http://bugs.yacy.net/view.php?id=62
13 years ago
orbiter 9e4875230f performance hacks
13 years ago
orbiter a9838f8b99 fix for http://bugs.yacy.net/view.php?id=59
13 years ago
orbiter a7df70221e refactoring
13 years ago
orbiter cf4fd525ee added directDocByURL attribute in crawl profile
13 years ago
orbiter c61e4cfd78 - fix for incomplete clear() in balancer
13 years ago
orbiter 813f297a95 another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
13 years ago
orbiter 035ebfbf3b - performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
13 years ago
orbiter b250e6466d implemented crawl restrictions for IP pattern and country lists
13 years ago
f1ori e207c41c8e * fix urlproxy for urls containing dolar signs
13 years ago
orbiter 5ad7f9612b added crawl settings for three new filters for each crawl:
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
low012 42b5f09f68 *) this should fix a bug in snippet creation (also cleaned up a little bit)
13 years ago
orbiter 6b22865dbc - removed some warinings
13 years ago
orbiter 0c6d95e57b - more tolerance against failure of table opening
13 years ago
orbiter 4f31869c5a enhanced search result timing
13 years ago
orbiter 6b02b696b0 - add number of search results to end of rss and json output to reflect latest status of retrieval
13 years ago
f1ori 87e6abd168 * fix urls containing a port number in urlproxy
13 years ago
f1ori 97045022fa * pass cookies to Server Side Includes
13 years ago
orbiter ce2a76d603 performance hack for search process
13 years ago
orbiter 2c4a672fe2 bugfixes and performance hacks for tabe index
13 years ago
orbiter dad5b586a4 added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
13 years ago
orbiter 734059d33e performance hacks
13 years ago
orbiter 23e81b28b2 synchronization enhancements
13 years ago
orbiter dd4635e323 patches
13 years ago
orbiter bb0c045036 fix for problem with relocation of network
13 years ago
orbiter 85a5487d6d YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
13 years ago
orbiter 52a2b3f110 try to fix bug http://bugs.yacy.net/view.php?id=26
13 years ago
orbiter 2cba860693 - fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
13 years ago
orbiter cec3836e73 added reference limitation to IndexControlRWIs_p.html servlet
13 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
13 years ago
f1ori 41e146116a fixes size of document in case the server doesn't give the size in the header
13 years ago
orbiter e1a3d609aa moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
13 years ago
sixcooler 2cf61a40ce fixed a bug from 7856, where Snippet returned an error by mistake when Metadata was found
13 years ago
orbiter 610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
13 years ago
orbiter 3da21c4266 protection against starting of a (second) yacy peer while another one is already running on the same port
13 years ago
orbiter 3e6767d66c limitation of reference evaluation (protection against crawler pits)
13 years ago
orbiter 2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
13 years ago
orbiter 9f9f634de2 fix in search
13 years ago
sixcooler 5f8a5ca32d - not doing merge-jobs while short on Memory
13 years ago
orbiter 22d69a6368 refactoring in cora: added sorting package
13 years ago
orbiter 51cf697acd refactoring: moved all score-related classes to new ranking package
13 years ago
sixcooler 169236c6d9 almost revert changes in this class of 7880 and 7882
13 years ago
sixcooler 4fec99115b Implementation of strategies for controlling memory resources.
13 years ago
orbiter c64faf41e2 addon to svn 7880
13 years ago
sixcooler 06408a9428 since many POST-requests come as gzip they report a contentlength of -1
13 years ago
orbiter 594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
13 years ago
sixcooler eb14111200 encapsulate potential expensive objects in TextSnippet to allow GC them asap
13 years ago
orbiter e3fc1efbef performance hack and ensuring termination in serverAccessTracker. cause:
13 years ago
orbiter 44d74f8f89 performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
13 years ago
sixcooler 5cd07d7f84 early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
13 years ago
sixcooler a311596881 finishing up my commits (7855-7858) which could be helpful for
13 years ago
sixcooler c0caca57e3 stoping thread for fetching searchresults if running short on memory
13 years ago
sixcooler ce248cc8dd less byte-arrays of response-content, less byte-array <-> stream conversation
13 years ago
sixcooler 59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
13 years ago
f1ori 3a5fa73008 * revert parts of previous commit, because it breaks the trickle-feature
14 years ago
f1ori 6e79675ff3 * use gzip-encoding in more cases
14 years ago
sixcooler aff875baef smaler ping-entry @ ProfilingGraph
14 years ago
orbiter 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
14 years ago
orbiter be15874be1 added request line in http which can support better debugging
14 years ago
orbiter 11dc653de3 added a visualization of peer pings to the performance graphic
14 years ago
orbiter 3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
14 years ago
cominch 09bb7a390c do not replace malformed or invalid URLs in urlproxy
14 years ago
orbiter 768c59740c - replaced solrj 3.1 with solrj 3.3
14 years ago
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
orbiter 719777b2a7 replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
14 years ago
orbiter 2d4bb139d3 - added counting of links with noindex tag for solr index
14 years ago
orbiter 892caccdca added default configuration in ConfigurationSet in case of new values
14 years ago
orbiter bda3eec0ff added parsing of canonical link element to html parser
14 years ago
orbiter b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
14 years ago
f1ori a17351dcfe * navigation bar for filetype constraints
14 years ago
f1ori 96957375cc * fix url proxy for relative links and chromium
14 years ago
orbiter 9ebc75db4b fix for channel authorization
14 years ago
orbiter 6d9e5865ee faster appearance of search result page (but complete search time is the same)
14 years ago
orbiter f7ca84cfc0 enhanced template engine
14 years ago
orbiter 84c9658644 added a file type navigator
14 years ago
orbiter 31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
14 years ago
orbiter 4b425ffdd2 fix for http://bugs.yacy.net/view.php?id=41
14 years ago
orbiter 7db208c992 performance hacks: more pre-allocated StringBuilder
14 years ago
orbiter 87bd559c42 fixed warning
14 years ago
orbiter f30d36b101 enhanced template engine
14 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
sixcooler 7bfa6bb4b6 prevent getting a yacySeed from zero-length-hash-string by chance
14 years ago
orbiter bce280a308 update on options for interface graphics
14 years ago
orbiter 2683162ec5 - added more options to access grid picture, web structure picture and network graphics
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
f1ori 900dacbf97 * improve link rewriting in proxy-url
14 years ago
f1ori dc855d881b * further improve proxyurl
14 years ago
orbiter a7a6b392f5 code cleanup
14 years ago
orbiter fe0c08455b more concurrency (enhancement) hacks
14 years ago
orbiter 0e9a99cb05 another resource hack
14 years ago
orbiter 535b6b953c more hacks to omit superfluous string object allocation
14 years ago
orbiter 87082f407e less String object creation during search
14 years ago
orbiter ab5a16b957 lesse memory occupation during ranking and faster host navigator
14 years ago
orbiter 1489ebeedf one more hack to free ram for search events
14 years ago
f1ori ddcc333acc * fix negative result counts
14 years ago
orbiter fa734bdf9f better memory protection in search logger
14 years ago
orbiter dbea40d536 - changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 746e3c3b06 Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
14 years ago
f1ori 14e1666b21 * fix replacing regexes in url proxy
14 years ago
orbiter e28bd0d038 fix for some possible causes of memory leaks
14 years ago
orbiter 09ba6814c0 - non-blocking word hash computation with dynamic digest object generation (this was important!)
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter bd55dcee50 - commented out experimental distributed ranking loading
14 years ago
orbiter d1dbbd956a always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
14 years ago
orbiter 0d040ff6bb fix for bug 0000036: no crawling of https pages
14 years ago
orbiter 3ed4a09368 small features, some bug fixes and performance hacks
14 years ago
orbiter e55c254f7b enhanced logging
14 years ago
orbiter b45701d20f this is a re-implementation of the YaCy Block Rank feature
14 years ago
orbiter 021840e5ba removed (almost) deadlocks and unnecessary CPU load
14 years ago
orbiter 123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
14 years ago
orbiter 1d8b0f74f4 one more fix for SVN 7713
14 years ago
orbiter 0960261769 fix for svn 7713
14 years ago
orbiter 5b579e21a3 code cleanup
14 years ago
orbiter 039126cfaf better handling of on/off switched solr indexing
14 years ago