Commit Graph

50 Commits (acc19e190dd0869dd9773102241fd0181749cb3a)

Author SHA1 Message Date
Michael Peter Christen 7e0ddbd275 added a "fromCache" flag in Response object to omit one cache.has()
13 years ago
Michael Peter Christen e7e381d110 added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
orbiter aa322bc6d0 fix
13 years ago
orbiter f183d3822c added a default accept header in http requests since some http fraud detection functions check that this header field exist
13 years ago
orbiter d2ea250d99 refactoring:
14 years ago
sixcooler 59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter d8e934c085 better abstraction of http client identification
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
f1ori 741a87a3e9 * make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers)
14 years ago
f1ori dca9e16f51 * don't index pages, which redirect, twice
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
15 years ago
orbiter d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
15 years ago
orbiter 5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
15 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
15 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
15 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
15 years ago
orbiter 844f158686 - removed dependencies in header framework:
15 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
15 years ago
sixcooler a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
15 years ago
sixcooler 15e8c13526 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter 87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 3300930fc5 - (almost) fixed FTP crawler
15 years ago
orbiter 2d8f3ee301 some performance hacks
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
16 years ago
orbiter 3528b970d6 - refactoring
16 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
16 years ago
orbiter e7f18ba24b refactoring
16 years ago
orbiter ce8dc575ca refactoring
16 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
16 years ago
orbiter 735e2737e3 * added index segments
16 years ago
orbiter 3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
16 years ago
low012 f65bfaa9af *) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
16 years ago
orbiter 3e9dcfc204 fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
16 years ago
orbiter 44579fa06d - fixed a problem loading images through yacy's document loader,
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 4da9042e8a code simplification
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago