Commit Graph

539 Commits (40c085648926a58cff9ab6c61d30c7dc44fa0d5e)

Author SHA1 Message Date
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
orbiter 1214615185 fix for 'invisible entry', see http://forum.yacy-websuche.de/viewtopic.php?p=22133#p22133
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 1110d16af9 performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
14 years ago
low012 ce012e11aa *) deleted LogStatistics since the page did not work anymore and it seemed to be obsolete, tell me if you miss it and I will add it again
14 years ago
low012 c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
lotus b1484299b2 same units for memory observer configuration (MiB)
14 years ago
orbiter 0769f4caa6 added search suggestions for interactive search: is only shown if there are no search results
14 years ago
f1ori e4aabaa1c3 * fix negative filelength for files >2G
14 years ago
f1ori ee3cef91e8 * fix filesize in ftp crawls
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter e88c428008 fix to ftp loader
14 years ago
orbiter 9b25a33fd9 - fixed numerous bugs
14 years ago
orbiter 7bdb13bf7f more fixes to smb crawling: better file names
14 years ago
orbiter 94c48500cc several fixes
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter fffb91447a fixed crawl queue delete function
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
f1ori 741a87a3e9 * make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers)
14 years ago
f1ori dca9e16f51 * don't index pages, which redirect, twice
14 years ago
orbiter 09badc697b - low-memory patch for crawler
14 years ago
orbiter 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
14 years ago
orbiter 4c72885cba added a sitemap entry parser and loader for sitemaps
14 years ago
f1ori def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter e3e3b49d52 - enhanced main release recognition
14 years ago
orbiter ca738ac924 - added a tag cloud to search results (using the topics)
15 years ago
orbiter e4d561971e added more score cluster options and made score cluster usage more transparent
15 years ago
orbiter e8f90201a5 fix for scheduling of rss feeds
15 years ago
orbiter 6a166c2040 patches for bad proxy behaviour
15 years ago
orbiter 45b1ab3d07 custom + generic skins:
15 years ago
orbiter 0d363a94d7 more performance hacks
15 years ago
orbiter 091dd3f6ec - enhanced intranet search speed
15 years ago
orbiter aacf572a26 - enhancements for search speed
15 years ago
orbiter 2c549ae341 fixed a number of small bugs:
15 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
15 years ago
orbiter d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
15 years ago
orbiter 48c0d508ac fixes for crawling of smb links (file length not always available)
15 years ago
orbiter 461a2a6ec7 enhanced remote crawling:
15 years ago
orbiter 5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
15 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
15 years ago
orbiter 348dece62f redesign of the SortStack and SortStore classes:
15 years ago
orbiter 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
15 years ago
orbiter 5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
15 years ago
orbiter 22047ffad5 enhanced computation speed of many replaceAll string operations
15 years ago
orbiter 9d080f387e change in handling of the all-visible home path for storage in YaCy:
15 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
15 years ago
orbiter 104318d58a - added nice colors to feed indexing state messages
15 years ago
orbiter 0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
15 years ago
orbiter c60d0282fd more abstraction for tables stored in heaps:
15 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
15 years ago
orbiter 844f158686 - removed dependencies in header framework:
15 years ago
orbiter 5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
15 years ago
orbiter caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
15 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
15 years ago
sixcooler 661867923a ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 7aa860c505 - more logging
15 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
15 years ago
orbiter 7fdb17bb96 redirect uncaught exceptions to logging + small other changes
15 years ago
orbiter 87b1684211 additional double-check in balancer
15 years ago
orbiter 0d81731e88 fixed crawler bug caused by NPE in logging
15 years ago
orbiter a82a93f2fc - better url double check in crawler
15 years ago
sixcooler a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 171f2bd84e - removed unused network oanet
15 years ago
sixcooler d88b9606d1 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2923
15 years ago
orbiter 6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
15 years ago
orbiter 5924a0d851 - enhanced concurrency in database index access for multicore
15 years ago
sixcooler 15e8c13526 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
15 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
15 years ago
orbiter 5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
15 years ago
orbiter dcd01698b4 added a 'transition feature' that shall lower the barrier to move from g**gle to yacy (yes!):
15 years ago
orbiter 777195e8d1 more abstraction for access of LoaderDispatcher and cache
15 years ago
orbiter 7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
15 years ago
orbiter 73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
15 years ago
orbiter 87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
15 years ago
orbiter b03caaa57a better handling of OOM situations
15 years ago
orbiter 60e71876ad - more abstraction (HashMap -> Map)
15 years ago
orbiter a83772c71b fixes and enhancements for balancer:
15 years ago
orbiter 9cde05418f fixed url crawl list display
15 years ago
orbiter 30b337fa9f fixes to balancer when crawling filesystem (problem was: host == null)
15 years ago
orbiter 844853243a fixed balancer time guessing
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 6950d8a33d fixes to SMB crawler
15 years ago
orbiter 2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 7ab207d93a better presentation of search result metadata and fixes to htcache loading
15 years ago
orbiter 40a8d132d9 tried to fix 100% CPU when calling Balancer.top()
15 years ago
orbiter 90c3e5d6f6 - cleanup, removed unused imports
15 years ago
orbiter 4cd5418963 removed finalize methods because of a hint in
15 years ago
orbiter bfa35d6d20 possible fix for ZURL.list counter
15 years ago
orbiter 8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
15 years ago
orbiter 7b69d79727 enhanced remove() operation: in many cases it is not necessary to return the removed object to the called.
15 years ago
orbiter 64f29f990e a collection of performance hacks and code cleanup:
15 years ago
orbiter 8b8107b2a3 reduced IO-load and synchronization/blocking
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 48b9371735 changed balancer re-load counter. causes less blocking here doing intranet indexing.
15 years ago
orbiter 55d8e686ea performance hacks
15 years ago
hermens 4ec0092677 more null == proxy fixes
15 years ago
hermens 2f90f0ad56 Remove asserts blocking proxy use cases
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
low012 b97ad0f380 *) some minor changes for better code readability
15 years ago
orbiter ba51d140e1 added more info in assert in balancer
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 90dd197ae7 - no latency for local crawls
15 years ago
orbiter c855fc48c6 only load robots.txt for http and http protocol
15 years ago
orbiter 3300930fc5 - (almost) fixed FTP crawler
15 years ago
orbiter 9623d9e6d2 added a smb loader component for the YaCy crawler
15 years ago
orbiter b88f5fbb4b slightly changed crawling policy
15 years ago
orbiter 7684a575c4 fix for deletion of error database each time when YaCy starts up
15 years ago
orbiter c09a995930 better logging of double occurrences of urls in the crawler
15 years ago
orbiter 727dd9b193 - fixed a bug in robots.txt parser
15 years ago
orbiter 46c4f8b68a better look-ahead into the crawl queue: show more on crawl monitor
15 years ago
orbiter 564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
15 years ago
lotus 945e0ba5a5 allow global search if res. observer disabled index transmission
15 years ago
lotus 11188cd7eb resource observer now uses the Java 6 method to check for free space. thus, disk observing now needs Java 6 installed.
15 years ago
orbiter 8ce936bcdd added an api recording function: it shall be possible to record
15 years ago
orbiter e80e060ca6 - increased thread priority for server threads
15 years ago
orbiter 473b11033d fixed network switch process - crawling did not work after a switch before this fix
15 years ago
orbiter bd05e57d3b fix for http://forum.yacy-websuche.de/viewtopic.php?p=18563#p18563
15 years ago
orbiter 5df628a2a4 - added BEncoder class
15 years ago
orbiter 82f57f79e5 more PMD enhancements
15 years ago
orbiter a06f7ddb33 more PMD recommendations
15 years ago
orbiter 66c0a8e849 more PMD recommendations
15 years ago
orbiter bc96d74813 - clean-up of robots.txt parser
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
lotus eac2daf2e8 * reenable DHT if yet enough memory is available
15 years ago
orbiter d77a8f3b3e added some modifications recommended by PMD for better performance
15 years ago
orbiter d1973bae2a code cleanup: removed unused code and unused methods
15 years ago
orbiter a3b8b7b5c5 some redesign of the main menu structure:
15 years ago
orbiter eeca2ded92 fix for http://forum.yacy-websuche.de/viewtopic.php?p=18500#p18500
15 years ago
orbiter dff4f95c78 some patches to get the torrent parser working
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter 2d8f3ee301 some performance hacks
15 years ago
orbiter 4c99d4683d possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
15 years ago
lotus 6edc168cfe option to disable dht by memory limit:
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter e3025ee691 - new icon for OAI-PMH loading action
15 years ago
orbiter b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
15 years ago
lotus 79251e6f60 configurable disk space hardlimit for dht
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 30f108f97d added stub of oai-pmh importer (not working yet)
15 years ago
orbiter 52470d0de4 - fix for xls parser
16 years ago
orbiter 5e8038ac4d - refactoring of blacklists
16 years ago
orbiter 26fafd85a5 - more refactoring
16 years ago
orbiter 3528b970d6 - refactoring
16 years ago
orbiter a8ce192f63 - shifted main classes to new package net.yacy
16 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
16 years ago
orbiter e7f18ba24b refactoring
16 years ago
orbiter ce8dc575ca refactoring
16 years ago
orbiter bea3b99aff moved table and util classes
16 years ago
orbiter c0e0e1f422 moved blob classes
16 years ago
orbiter 1e4f8b56ed accumulated classes from different packages into the new rwi package
16 years ago
orbiter 194da25a2f moved kelondro index
16 years ago
orbiter 4446acc8cd moved kelondro order
16 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
16 years ago
orbiter 735e2737e3 * added index segments
16 years ago
orbiter 6e0dc39a7d - some fixes to prevent blocking situations
16 years ago
orbiter 04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
16 years ago
orbiter 6aa474f529 - better logging for web cache access and fail reasons
16 years ago
orbiter 3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
16 years ago
orbiter 2e6bdce086 - added more logging to balancer
16 years ago
hermens 62a7341c4d Fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2204
16 years ago
low012 f65bfaa9af *) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
16 years ago
orbiter 3e9dcfc204 fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
16 years ago
orbiter 573d03c7d7 added configuration to enable ram table copy
16 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
16 years ago
orbiter 44579fa06d - fixed a problem loading images through yacy's document loader,
16 years ago
orbiter 72e5407115 refactoring of snippet cache
16 years ago
orbiter 8e56c2ace6 fix for fixes from this afternoon
16 years ago
orbiter cf739edc2e fix for possible deadlock, see
16 years ago
orbiter 92edd24e70 fixed problem with switching of networks
16 years ago
orbiter 0575f12838 fix for deadlock
16 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 634a01a9a4 replaced wget-requests with caching requests
16 years ago
orbiter c6c97f23ad - added cache usage properties to crawl start
16 years ago
orbiter c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 4da9042e8a code simplification
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago