- don't send Body on HEAD requests
- don't send a Last-modified: date, that is later then Date:
- Use Cache-control instead of Pragma with HTTP/1.1
- don't send header with HTTP/0.9
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1198 6c8d7289-2bf4-0310-a012-ef5d649a1542
- dbImporter threads are now shutdown by the switchboard on server shutdown
- adding possibility to pause a importer thread via GUI
- Bugfix for abort function
See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363
*) Modification of content parser configuration
- now it's possible to configure which parsers should be enabled for the proxy,
crawler, icap, etc. separately
-
*) htmlFilterContentScraper.java
- adding regular expression to normalize URLs containing /../ and /./ parts
*) httpc.java
- adding functionality to unzip gzipped content
- requested by roland: should be used later to allow gzipped seed lists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542
- restructuring of mimeTypes based on the parsers
- displaying parser usage count
- displaying human readably parser names
- displaying parser version information
*) httpdFileHandler.java
- adding possibility to support "streaming" servlets
which are special servlets that can communicate with
the client via the connection streams autonomous
- the name of these new servlet types must end with the
file extension .stream
- this feature will be needed by the yacy ScreenSaver
class to fetch statistic data from the peer without the
need to reconnect to the server all the time
*) Adding human readable names and version information for
all supported parsers
*) plasmaParser.java
- adding new structure to store parser statistic data
*) Adding openDocument parser
- can be used to parse odt files
*) jmimemagic
- adding rules to detect openDocument formats properly
*) serverLog.java
- adding functions that can be used to query if a given
logging level is enabled or not.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Bugfix: old dns cache did not handle case insensitive hostnames correctly.
- adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
e.g. borg-300.dyndns.org
This can be done by setting the new httpc.nameCacheNoCachingPatterns property
- using httpc.dnsResolve wherever possible within the sourcecode
[httpd.java,plasmaCrawlStacker.java]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
- solving problems with unkown certificates by implementing a dummy trust Manager
- adding https support to robots-parser
- Seed File can now be downloaded from https resources
- adapting plasmaHTCache.java to support https URLs properly
*) URL Normalization
- sub URLs are now normalized properly during indexing
- pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
- normalizing URLs which were received by a crawlOrder request
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804
*) Adding first version of db test for mysql
NOTES:
- db user + db + db table must be created before starting the test
- db table must be empty. Entries can not be updated at the moment
- db connection properties must be changed in the sourcecode at the moment
TODOs:
- accepting connection properties via command line
- implementing update + remove + read operations
- 'maybe' adding code to create db + table if it doesn't exists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542
to disallow yacy to index the response that belongs to the request where
X-YACY-Index-Contro is set to "no-index"
*) Bugfix for Seed-List download via Remote Proxy.
Now the pragma and cache-control http headers of the request are properly set to "no-cache"
See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639
*) Bugfix for http-Proxy
yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests.
Now, these request headers are evaluated properly
TODO: Missing evaluation of "no-store" request headers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542
- remote proxy configuration can now be "really" changed on the fly and takes effect immediately
- adding possibility to disable remote proxy usage for yacy->yacy communication
- adding possibility to disable remote proxy usage for ssl
- restructuring proxy configuration so that it is stored in a single place now
*) Adding possibility to import a foreign word DB (or even more of them in parallel)
at runtime into the peers DB
- this can be done by calling IndexImport_p.html
- ATTENTION: please not that at the moment this thread must be aborted via gui
before a normal server shutdown is done.
- TODO: integrating IndexImport Thread into normal server shutdown
- TODO: Adding posibility to import crawl-queues, etc. from foreign peers
- TODO: removing old import function from yacy.java and calling the new routines instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Requestheader was not passed to the underlying post function properly
- Bug seems not to have caused any side-effect until yet
*) Bugfix for manual peer ping functionality
*) Bugfix for UnresolvedPattern Problem if an Exception occurred in a servlet.
See: http://www.yacy-forum.de/viewtopic.php?t=1353
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@963 6c8d7289-2bf4-0310-a012-ef5d649a1542
Adminrights are OR(old auth or new).
Proxyrights are AND(you need Proxyrights and a not reached Timelimit)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@960 6c8d7289-2bf4-0310-a012-ef5d649a1542
- adding automatic refresh
- accepts new parameter nameLookup which can be used to deactivate
yacy-peer name lookup (because we have problems with this on large seed-dbs)
*) ViewFile
New page that can be used to view
- original content
- plain text content
- parsed content
- parsed sentences
of a webpage specified by there url hash
Mainly for debugging purpose at the moment
*) Robots.txt
Bugfix for if-modified-since usage
TODO: synchronization of downloads to avoid loading the same robots-file
multiple times in parallel by different threads
*) Shutdown
Better abortion of transferRWI and transferURL sessions on server shutdown
*) Status Page
Adding icon to start/stop crawling via status page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542