*) Making Seed-Upload configuration more verbose.
*) Some Changes in SOAP Search API (not finished yet).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@158 6c8d7289-2bf4-0310-a012-ef5d649a1542
optional content parsers, thread pool configuration ...
Please help me testing if everything works correct.
*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be
included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
available, the uploader is deactivated automatically.
*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
the list of enabled parsers is now stored in the main config file
*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool
*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631
*) Replacing some hardcoded strings with the proper constants of the httpHeader class
*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
so that a thread dump is more verbose
*) Moving code for transparent proxy support to a separate function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@98 6c8d7289-2bf4-0310-a012-ef5d649a1542
- now the ant build file has the same functionality as the makerelease build file
- from now on the ant build files can be used instead of the makerelease build script
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@84 6c8d7289-2bf4-0310-a012-ef5d649a1542
- each additional parser must be in a subpackage
of plasma.parser
- each parser must have its own ant build file (which will
be called automatically from the main build file)
- Calling the main build file results in building a separate
zip file for each optional parser. This zip file includes:
+ sources of the Parser.java
+ compiled classes of the Parser.java
+ needed additional libs (libx)
- To install an additional parser the user simply needs to
extract the zip file listed above into his/her yacy directory.
- The configuration (enabling/disabling) of a parser can be done
via the webinterface (currently the settings dialoge) and is
done "on-the-fly". The installation can not be done "on-the-fly"
at the moment because of classpath issues.
- The classpath of the linux startup/stop scripts is generated
automatically now (including all libraries from lib and libx).
*) Bugfix: File Extension was not calculated correctly by the crawler
e.g.: file extension was accidentally: .php?param=value
Corrected.
*) Adding additional parser for parsing of rss/atom feeds
- added needed libs to do this.
TODO:
- automatic building classpath for windows startup scripts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@78 6c8d7289-2bf4-0310-a012-ef5d649a1542
This synchronized keyword is not needed anymore because of the crawler jobqueue which
is responsible for the synchronization now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@60 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542
- introduction of a threadpool for crawling
- introduction of a job queue to avoid buzy waiting for a free crawler slot
*) New classes added
- queue for receiving of crawler jobs
- semaphore class to do reader/writer synchronization (mutual exclusion)
- message object to hold all needed data about a crawler job
*) Trying to solve session-thread shutdown problem
- session thread stopped variable is now set from outside before interrupting the
session thread.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@39 6c8d7289-2bf4-0310-a012-ef5d649a1542
can be used instead of a ByteArrayOutputStream
*) Using a serverByteBuffer for lineBuffering in class httpc
instead of a ByteArrayOutputStream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@35 6c8d7289-2bf4-0310-a012-ef5d649a1542
- httpc: wrong error-message on 404
- httpc: error message was accidentally shown when object
was released from pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@31 6c8d7289-2bf4-0310-a012-ef5d649a1542
- many classes set to final
- implementation of a session-thread pool
- reusage of the server handler class (normally the httpd object)
within the session thread
- implementation of a httpc object pool
- introduction of a linebuffer in httpd which can be reused
- reusing the properties table in the httpc
- added to apache libs (commons-collections, commons-pool) which
are needed for the object/thread pool implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@26 6c8d7289-2bf4-0310-a012-ef5d649a1542