- more debugging output: storageTime for indexed document is logged now
- saving memory in plasmaParserDocument.java, plasmaWordIndexEntryContainer.java (not a big deal)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@798 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
make many methods for httpHeader/Requestline parsing
reusable for new icap implementation
*) adding chunked input stream support
- needed by new icap implementation
- needed by future httpc HTTP/1.1 support
*) httpd.java
- moving all connection property contants to class httpHeader
- moving readHeader function to class httpHeader
- moving parseQuery function to class httpHeader
- moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
- adding new fuction to parse the http response line
- adding new function to converte http headers to a string that
can be send to the client
- adding a function that generates a proper url using all parsed
connection properties
*) ICAP Support
- yacy now supports handling of icap response modification requests
- this feature can be used by other icap enabled proxies to contact
yacy as icap server, and to handover the downloaded content to yacy.logging
for indexing
- functionality was successfully tested with squid 2.5Stable 10 + icap patch
- further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
- htcache entries that are still needed for indexing are now properly registered
as in use after system restart
- extended logging: log message now shows parsing and indexing time for each sb. entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
into the sb-queue anymore if the mimeType or fileExtension is not supported
by the installed parsers.
- Advantage: Avoiding unnecessary enqueueing and dequeueing from queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@664 6c8d7289-2bf4-0310-a012-ef5d649a1542
- harmonizing proxy exception handling
- adding malformed URL + blacklist check for http head method
- adding malformed URL check to http post method
- chunked encoding is now not used anymore for http post if clients
are http/0.9 or http/1.0 clients (same behaviour as already implemented for get)
- now an exception will be thrown on internal httpc errors to force an error output
to the client or a connection close. This should help to fix the "binary data in browser window" bug
*) plasmaSwitchboard.java
- fixing the following Bug
E 2005/09/03 18:02:42 PLASMA Could not index URL http://mis04.de/FAIL/snot.php: null
java.lang.NullPointerException
at de.anomic.plasma.plasmaSwitchboard.processResourceStack(plasmaSwitchboard.java:1000)
at de.anomic.plasma.plasmaSwitchboard.deQueue(plasmaSwitchboard.java:625)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantThread.job(serverInstantThread.java:95)
at de.anomic.server.serverAbstractThread.run(serverAbstractThread.java:243)
This bug could occure if the cached responseHeader is null
- getting the mimeType now from the parsed document instead of the responseHeader because the
mimeType could have been changed during content parsing (e.g. because of the mimetypeParser)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@656 6c8d7289-2bf4-0310-a012-ef5d649a1542
- path now are absolute
*) move path check from plasmaHTCache to plasmaSwitchboard
- only one path check when starting
*) small other
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@606 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) CrawlWorker also does a URL normalization now before following the redirection URL
*) CrawlWorker removes redirection URL correctly from noticeURL stack now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@571 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Adding function to manually force peer ping to remote yacy peer
See:Network.html?page=4
- for debugging purpose only!
*) serverAbstractThread.java:
- Adding posibility to notify a server thread via a synchronization object
- this is needed e.g. by the port forwarding feature to send a notification
to the peerPing thread to redo peer-ping with the new ip/port Settings_p.html
*) Port Forwarding Feature (it should work now)
- adding a serverThread which is responsible to detect broken port forwarding
connections and to do reconnect if needed
- serverCore.java: moving port forwarding initialization into a separate function
- adding positility to configure the ssh port
- moving configuration section on the gui into a separate fieldset
- hello.java: only trying to do a second connect to the clientIp address during
peer handshake if either remote port forwarding is not enabled locally or
the clientIP is not equal to any local ip
*) httpdFileHandler.java:
- printout a more verbose errormessage
*) httpc.java
- allowing to deactivate content encoding from outside
*) plasmaCrawlWorker.java
- the crawler worker now tries to refetch the content of a website without
gzip content encoding if a gzip error occured
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@368 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) serverLog.java logging functions now also accept exceptions als
additional parameters.
The Stacktrace of this ecceptions will then be appended to the
logging message and can e.g. be viewed on the gui logging page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@265 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding a xslt stylesheet so that the rss document can be viewed in a normal webbrowser
*) adding pubDate tag to each search item
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@173 6c8d7289-2bf4-0310-a012-ef5d649a1542
optional content parsers, thread pool configuration ...
Please help me testing if everything works correct.
*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be
included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
available, the uploader is deactivated automatically.
*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
the list of enabled parsers is now stored in the main config file
*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool
*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631
*) Replacing some hardcoded strings with the proper constants of the httpHeader class
*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
This synchronized keyword is not needed anymore because of the crawler jobqueue which
is responsible for the synchronization now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@60 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542