which should help to detect transfer errors on yacy to yacy
communication
- not finished yet
*) removing unneeded functions (e.g. respondHeader) because newly
introduced functions in class httpd.java
*) httpdFileHandler.java now always sends back a proxy error message
as body of a response with an error code
*) adding support of gzip content encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@244 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding hashset for file-extensions that should not be transfered
using gzip content encoding
*) doing bugfixes on old keep-alive implementation
*) doing some additional http header validation according to rfc
*) doing all persistent connection detection in separate function now
*) doing server authentication in separate function now
*) doing proxy authentication in separate function now
*) simplifying GET, POST, HEAD functions because of new introduced
functions listed above
*) adding new function to handle empty request lines (which could
occure after post requests send via a persistent connection;
this depends on the used browser)
*) adding new function to handle unknown request methods by sending
a correct error message back to the client
*) setting correct content-length when sending back error messages
to the client
*) adding new functions that must be used by all http-Handler classes
to send
- a proxy error message
- a http header
back to the client
*) adding new function: shallTransportZipped
moved here from httpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@243 6c8d7289-2bf4-0310-a012-ef5d649a1542
- httpc
- response
*) simplifying gzip encoding
*) remembering http version of contacted server
(neede for later support of keep alive by httpc)
*) moving function shallTransportZipped to httpd.java
because this function is used multiple times
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@242 6c8d7289-2bf4-0310-a012-ef5d649a1542
- ConsoleOutErrHandler.java used to log warnings/errors to stderr
and all other messages to stdout
- GuiHandler.java
used to keep logging messages in memory that can then be viewed
via the http gui
- serverSimpleLogFormatter.java
needed to format logging messages for FileHandler, ConsoleOutErrHandler
and GuiHandler
- serverMiniLogFormatter.java
needed for proxy access logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@233 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding a xslt stylesheet so that the rss document can be viewed in a normal webbrowser
*) adding pubDate tag to each search item
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@173 6c8d7289-2bf4-0310-a012-ef5d649a1542
This is buildconfigureable by changing the extensionTarget Property in the build.properties file
*) Trying to solve "yacy.java template replacement / ant build failed" bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@169 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) automatically adding SVN Revision number to tar file name
*) introducing build.properties file that can be used to set the build version number and date
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@164 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) Making Seed-Upload configuration more verbose.
*) Some Changes in SOAP Search API (not finished yet).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@158 6c8d7289-2bf4-0310-a012-ef5d649a1542
optional content parsers, thread pool configuration ...
Please help me testing if everything works correct.
*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be
included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
available, the uploader is deactivated automatically.
*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
the list of enabled parsers is now stored in the main config file
*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool
*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631
*) Replacing some hardcoded strings with the proper constants of the httpHeader class
*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
so that a thread dump is more verbose
*) Moving code for transparent proxy support to a separate function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@98 6c8d7289-2bf4-0310-a012-ef5d649a1542
- now the ant build file has the same functionality as the makerelease build file
- from now on the ant build files can be used instead of the makerelease build script
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@84 6c8d7289-2bf4-0310-a012-ef5d649a1542
- each additional parser must be in a subpackage
of plasma.parser
- each parser must have its own ant build file (which will
be called automatically from the main build file)
- Calling the main build file results in building a separate
zip file for each optional parser. This zip file includes:
+ sources of the Parser.java
+ compiled classes of the Parser.java
+ needed additional libs (libx)
- To install an additional parser the user simply needs to
extract the zip file listed above into his/her yacy directory.
- The configuration (enabling/disabling) of a parser can be done
via the webinterface (currently the settings dialoge) and is
done "on-the-fly". The installation can not be done "on-the-fly"
at the moment because of classpath issues.
- The classpath of the linux startup/stop scripts is generated
automatically now (including all libraries from lib and libx).
*) Bugfix: File Extension was not calculated correctly by the crawler
e.g.: file extension was accidentally: .php?param=value
Corrected.
*) Adding additional parser for parsing of rss/atom feeds
- added needed libs to do this.
TODO:
- automatic building classpath for windows startup scripts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@78 6c8d7289-2bf4-0310-a012-ef5d649a1542
This synchronized keyword is not needed anymore because of the crawler jobqueue which
is responsible for the synchronization now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@60 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542
- introduction of a threadpool for crawling
- introduction of a job queue to avoid buzy waiting for a free crawler slot
*) New classes added
- queue for receiving of crawler jobs
- semaphore class to do reader/writer synchronization (mutual exclusion)
- message object to hold all needed data about a crawler job
*) Trying to solve session-thread shutdown problem
- session thread stopped variable is now set from outside before interrupting the
session thread.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@39 6c8d7289-2bf4-0310-a012-ef5d649a1542
can be used instead of a ByteArrayOutputStream
*) Using a serverByteBuffer for lineBuffering in class httpc
instead of a ByteArrayOutputStream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@35 6c8d7289-2bf4-0310-a012-ef5d649a1542
- httpc: wrong error-message on 404
- httpc: error message was accidentally shown when object
was released from pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@31 6c8d7289-2bf4-0310-a012-ef5d649a1542
- many classes set to final
- implementation of a session-thread pool
- reusage of the server handler class (normally the httpd object)
within the session thread
- implementation of a httpc object pool
- introduction of a linebuffer in httpd which can be reused
- reusing the properties table in the httpc
- added to apache libs (commons-collections, commons-pool) which
are needed for the object/thread pool implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@26 6c8d7289-2bf4-0310-a012-ef5d649a1542