Commit Graph

616 Commits (5e2d38ef190a69c49862a3669e0693f422105ebe)

Author SHA1 Message Date
orbiter 903c824c2c - allow only scanned resourced with granted status
14 years ago
low012 936e976c23 *) added FreeMind (http://freemind.sourceforge.net/) mindmap parser
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
low012 2a6499364d *) minor changes
14 years ago
low012 c0274bd123 *) minor changes
14 years ago
orbiter fe46536f6e enhanced network scanner (less name resolving during scanning and no name resolving during search)
14 years ago
orbiter e753027c43 fix for http://forum.yacy-websuche.de/viewtopic.php?p=21439#p21439
14 years ago
orbiter bf4ef1513e - fix for map view
14 years ago
orbiter 6b70393d1d - new java version 1.6
14 years ago
orbiter e88c428008 fix to ftp loader
14 years ago
orbiter 59b70a5a92 another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files
14 years ago
orbiter 9b25a33fd9 - fixed numerous bugs
14 years ago
orbiter 7bdb13bf7f more fixes to smb crawling: better file names
14 years ago
orbiter 94c48500cc several fixes
14 years ago
orbiter 0ac7311a62 fix for token parser
14 years ago
orbiter 58b59f9bc8 - a collection of bug fixes and some redesign of the Scanner class
14 years ago
orbiter c288fcf634 redesigned CrawlStartScanner user interface and added more features:
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter 99a7fe87f9 - removed old intranet scanner (the generic scanner now completely subsumes the old one)
14 years ago
orbiter acab6801d9 added new network scanner
14 years ago
orbiter 14e4fae8e9 fixes to ftp client
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
orbiter db99db4be9 some redesign of the search-fail-response mechanism:
14 years ago
f1ori 4915d1781a * use local backup-file, if remote network-definition is not availible
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter d78e322e84 added a directory-structure reader to ftp client
14 years ago
orbiter f0651e5f2f added image search to yacyinteractive.html
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
orbiter 21e84539e8 one more fix to Domains
14 years ago
orbiter e192d61972 fix for latest commit
14 years ago
orbiter 22453b13ad implemented local host address discovery as posted in
14 years ago
orbiter cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
14 years ago
orbiter a9f754c45f removed unused CR accumulation and distribution process
14 years ago
orbiter 3d945bb442 fix for ftp client: suppress bad directory listing time-out
14 years ago
orbiter d4a1a1850b removed warnings
14 years ago
low012 9b3fae9496 *) cleaning up the code a little bit
14 years ago
orbiter 321eb012fe removed two warnings and reverted one change
14 years ago
f1ori fd74bc388c * fix small bug in sessionid-removal
14 years ago
low012 eb79b952ef *) cleaner code
14 years ago
low012 38fdf43587 *) renamed classes according to standard Java coding conventions
14 years ago
low012 025e3f4790 *) renamed classes according to standard Java coding conventions
14 years ago
f1ori a025b1da89 * fix bug when browsing local filesystem (e. g. repository) with yacy
14 years ago
sixcooler b87bf88ac8 using less memory on merging and rewriting blobs
14 years ago
f1ori d62e449a11 * fix FilterEngine, forgot comparision-operator
14 years ago
orbiter 441fbc26e2 security patch for WeakPriorityBlockingQueue (produced a deadlock)
14 years ago
orbiter 5dcb838293 - removed thread overhead when calling dns services
14 years ago
orbiter 4c50d3428e smaller file size for array stacks to support smaller deletion sizes
14 years ago
orbiter becc463d8a enhanced did-you-mean
14 years ago
orbiter 93c535d111 fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
14 years ago
orbiter 04932dc268 added rdf data structure for rss feeds
14 years ago
orbiter 84f2953cd8 fix for rss loader / rss type recognition
14 years ago
orbiter 4c72885cba added a sitemap entry parser and loader for sitemaps
14 years ago
orbiter 445619f3ec added a submenu ConfigHTCache_p.html to set the size of the HTCache separately from the proxy configuration.
14 years ago
sixcooler 85c65475fa smal but important correction of last commit @ HTTPClient
14 years ago
f1ori acd93b1b31 * add failsafe mechanisme to domainlist retrieval
14 years ago
orbiter 70c95608d4 Added CORS Access header for yacysearch.rss output
14 years ago
f1ori def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
14 years ago
orbiter fb92f9ae8e added mime type image/jpeg (image/jpg is wrong but it is left here because it does not harm and this error also exists in configuration of web servers)
14 years ago
orbiter 155d556568 - better memory protection
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter e3e3b49d52 - enhanced main release recognition
14 years ago
orbiter 58e74282af added a word counter statistic in condenser which is used by the did-you-mean to calculate best matches for given search words.
14 years ago
orbiter 863065abc4 added user agent logging to access tracker
14 years ago
orbiter ed4371dcf3 enhanced navigation implementation and enhanced tag cloud computation
15 years ago
orbiter ca738ac924 - added a tag cloud to search results (using the topics)
15 years ago
orbiter e4d561971e added more score cluster options and made score cluster usage more transparent
15 years ago
orbiter 7cd9d9d22a - enhanced DidYouMean computation using a faster count on index entries; this causes that results can be ranked better
15 years ago
orbiter de722090b5 enhancements in did-you-mean guessing
15 years ago
orbiter 24f1cba7b2 performance hacks:
15 years ago
orbiter d607b30b6a performance enhancements for search and code review for database functions
15 years ago
orbiter fcd40cd30f - disabled domZones (buggy, must think about better solution)
15 years ago
orbiter ec38eca278 fix for new URI equal method
15 years ago
orbiter 0d363a94d7 more performance hacks
15 years ago
orbiter b8aee6d402 performance hacks for better search performance
15 years ago
orbiter 091dd3f6ec - enhanced intranet search speed
15 years ago
orbiter aacf572a26 - enhancements for search speed
15 years ago
sixcooler 61c82f3105 gzip-compresson @ transferRWI & transferURL back again
15 years ago
orbiter 2c549ae341 fixed a number of small bugs:
15 years ago
orbiter 3057a0b939 - intranet scanner now produces urls with host names, not ips if possible
15 years ago
orbiter e63896f2a8 added an intranet scanner and a servlet which shows all intranet addresses and an option to start a site-crawl for all these addresses at once.
15 years ago
orbiter e54cb7fb0c more bugfixes (also for latest commit)
15 years ago
orbiter be6b48311c misc bugfixes
15 years ago
orbiter d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
15 years ago
orbiter 48c0d508ac fixes for crawling of smb links (file length not always available)
15 years ago
f1ori e670e1ef8e add charset auto-detection for htmlParser
15 years ago
f1ori ddcd5ae78c fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2989
15 years ago
f1ori 8fe1102452 fix http://forum.yacy-websuche.de/viewtopic.php?p=20889#p18426
15 years ago
orbiter 10a9cb1971 simplified snippet computation process and separated the algorithm into two classes
15 years ago
orbiter 84a023cbc8 fixed several search bugs
15 years ago
orbiter 09c208a3ab patch for corrupted database files (just work on and forget key)
15 years ago
orbiter 97ee278931 enhanced search speed:
15 years ago
f1ori b392ca5024 * add option to show YaCy version, usage:
15 years ago
orbiter ac73072924 added a demonstration class: integrate the YaCy search results in own applications
15 years ago
orbiter 8da4eb5de6 addition to patch in SVN 7111
15 years ago
orbiter 37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
15 years ago
orbiter 461a2a6ec7 enhanced remote crawling:
15 years ago
orbiter 0cf006865e refactoring and enhanced concurrency
15 years ago
orbiter 83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before)
15 years ago
orbiter 5702419194 fixed a bug in HTTPClient: keep-alive must be set to false, otherwise servers hold connections 2 seconds open until response.
15 years ago
orbiter 5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
15 years ago
orbiter ac1c08924e more performance hacks
15 years ago
orbiter 14c843d364 more performance hacks
15 years ago
orbiter 39f409a7bb performance hacks
15 years ago
orbiter 3c0e07ba72 removed all delays in shutdown process
15 years ago
orbiter 906c572621 - enhanced index create menu structure
15 years ago
orbiter 64860dc1bb enhanced search event logging (to be used for further improvements)
15 years ago
orbiter 7dbc357593 patch to identify corrupted database files
15 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
15 years ago
lotus d2a3d08c44 avoid div. by zero
15 years ago
orbiter 2c7edea35e - better shutdown behavior for the GUI (waits until data is written if GUI is killed)
15 years ago
orbiter 34a25856a5 - added navigation to next/prev search page using arrow keys (left/right)
15 years ago
orbiter 32f73d1aaa added copy for Info.plist for Mac application release updates (this file contains class paths and start parameters)
15 years ago
orbiter 4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more)
15 years ago
orbiter 570ca577c6 performance hacks
15 years ago
orbiter 348dece62f redesign of the SortStack and SortStore classes:
15 years ago
hermens 03eb021568 Fix for byte[] Objects as keys
15 years ago
orbiter 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
15 years ago
orbiter c0b08ac59b slighlty changed way of pdf parser integration
15 years ago
orbiter 6d83c7cb62 removed unnecessary Override statements (produces errors in strict validation)
15 years ago
orbiter 5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
15 years ago
orbiter 24502fe3de performance hacks
15 years ago
orbiter ffaa9a1c51 avoiding double-loading of the same resource from the web in case that a seond attempt to load the resource is started while the first attempt is still loading the content from the web. This will delay the second attempt to the time when the first attempt has finished with the possible result that the second attempt reads only from the web cache, not from the web.
15 years ago
orbiter d865ef77a8 removed re-read of index in case of a bad index. This may not solve the problem but it applies a 100% CPU problem on the peer. I'm afraid bad index files must be abandoned, and cannot be fixed this way.
15 years ago
orbiter b2c9db48ea Performance enhancement
15 years ago
orbiter ae07e11bc5 enhanced image search result display: concurrent loading of images before they are displayed
15 years ago
orbiter 22047ffad5 enhanced computation speed of many replaceAll string operations
15 years ago
orbiter e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation
15 years ago
orbiter 9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
15 years ago
orbiter b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by:
15 years ago
sixcooler ca0a03e9ea ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 3988a95fb5 added ability in rss reader to parse atom feeds
15 years ago
orbiter 9d080f387e change in handling of the all-visible home path for storage in YaCy:
15 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
15 years ago
f1ori 938676265f fix shutdown command, close HttpClient connection pool
15 years ago
orbiter 4f22e2df41 bugfixes for
15 years ago
orbiter 42414a6ae3 added two more tables in rss reader interface:
15 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds!
15 years ago
orbiter 0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
15 years ago
orbiter cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as
15 years ago
orbiter c60d0282fd more abstraction for tables stored in heaps:
15 years ago
orbiter d1be64d491 removed wrong assert
15 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
15 years ago
orbiter 844f158686 - removed dependencies in header framework:
15 years ago
orbiter 80ba543d4c svn fix for uppercase problem
15 years ago
orbiter 5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
15 years ago
orbiter caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
15 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
15 years ago
sixcooler 661867923a ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 7aa860c505 - more logging
15 years ago
orbiter 4d5446d641 code cleanup
15 years ago
orbiter 66ac3a7d9d corrected database row iteration
15 years ago
orbiter dfd416e3fb removed a mysterious image buffer
15 years ago
orbiter e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
15 years ago
orbiter 933dc1a600 removed old rss parser (will be replaced with parser from cora package)
15 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
15 years ago
orbiter 5a994c9796 added a scheduler based on API actions
15 years ago
orbiter 189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
15 years ago
orbiter 054c22e2c6 added TLDs from http://www.opennicproject.org
15 years ago
orbiter 86d7f8a989 - the web visualization can now be generated in custom color
15 years ago
orbiter 7fdb17bb96 redirect uncaught exceptions to logging + small other changes
15 years ago
orbiter a82a93f2fc - better url double check in crawler
15 years ago
sixcooler a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 171f2bd84e - removed unused network oanet
15 years ago
sixcooler 1802c54317 LGPL-Header
15 years ago
orbiter a835a22b32 fixed isLocal() property (better recognition of intranet hosts)
15 years ago
orbiter 670c746dc5 dual-licensed HttpConnectionInfo for LGPL
15 years ago
orbiter 301a59e07f moved browser access method from kelondro/util/OS to gui/framework/Browser
15 years ago
orbiter ec72387165 added a very early test version of a YaCy gui component.
15 years ago
sixcooler d88b9606d1 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2923
15 years ago
orbiter 6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
15 years ago
orbiter 5924a0d851 - enhanced concurrency in database index access for multicore
15 years ago
orbiter 55a2536bcf enhancement in drawing speed and reduction of object allocation during drawing
15 years ago
orbiter 9ab06bc333 enhancement in sorting efficiency (database root operation): less object allocation
15 years ago
sixcooler 39d96abbb5 fix yacyRelease download
15 years ago
sixcooler 349e4dee9d ... migrating to HttpComponents-Client-4.x ...
15 years ago
sixcooler c29f24a519 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
15 years ago
orbiter 989948e1a9 fixed generic image parser
15 years ago
orbiter e1015ead2c static access to constants
15 years ago
orbiter 27d8a8b53e removed wrong com.sun.codec class access in generic image parser
15 years ago
orbiter bbf887d879 added generics to UPnP classes
15 years ago
sixcooler 15e8c13526 ... migrating to HttpComponents-Client-4.x ...
15 years ago
mikeworks b12db14b9f Added Generics to new net.yacy.upnp.* classes to eliminate compiler warnings
15 years ago
sixcooler b7102eff92 ... migrating to HttpComponents-Client-4.x ...
15 years ago
mikeworks 572e429eff - fixes UPnP not working discussion on forum: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2881
15 years ago
mikeworks 2a20282505 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6987 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus 965aa97993 including sbbi upnplib as source again
15 years ago
orbiter 60caade056 removed debug output
15 years ago
sixcooler 52718e6dcb ... migrating to HttpComponents-Client-4.x ...
15 years ago
sixcooler 5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter dec1419bc3 ;-)
15 years ago
orbiter 22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
15 years ago
orbiter 8674a65488 removed override directive which caused a compile error in eclipse helios
15 years ago
low012 dc5f0e357c *) fixed SVN properties
15 years ago
low012 01d6b952f0 *) minor changes for easier to read code, no functional changes
15 years ago
sixcooler 0e56d29335 ... migrating to HttpComponents-Client-4.x ...
15 years ago
sixcooler 2ad5829b26 correct Timeoutparamter at HttpComponents-Client-4.x
15 years ago
sixcooler e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
15 years ago