News
This is essentially the release change-log. We have a release roadmap and releases published here will (hopefully) match the milestones from the roadmap's vision.
Release list in reverse order:
v0.44_20060307_1844
- New Features
- New simplified search page
- ajax-driven search result enrichment with snippet post-fetch. Snippets that are not available on page load time will be fetched using ajax requests.
- New 1-2-3(-4) configuration menu makes it easier to configure YaCy for first-time-users
- New yacy.badwords filters bad topwords
- Show public Bookmarks in Bookmarks.html, private ones, if the user is logged in.
- Added a yacybar.xpi to the release
- Bugfixes
- fixed conjunctive search; was broken because of wrong data structures
- special chars (like german umlauts) are now allowed in bookmark tagNames
- /xml/bookmarks/* now uses one file for private/public entries. private only with password
- disabled write cache to avoid database corruption in case of crash
- bugfixes to HTTP/0.9 header handling
- fixed re-search bug
- Enhancements
- index write access (dht transmission, indexing, dht deletion) is now completely synchronized, which increases speed and reduces IO
- there is now a real streaming support for lage files
- support of chunked transfer-encoding for http/1.1 clients
- support of gzip content-encoding suitable clients
- automatic TOC generation for pages in wiki
- changed user-agent string for yacy crawl access to 'yacybot'
- added default-skins
v0.43_20060210_1593
- New Features
- Better result ranking due to many new ranking attributes
- nearby-search in general and nearby-1 for queries enclosed in doublequotes
- new DetailedSearch page for ranking testing
- new Bookmark manager; search results can easily be added to bookmark collection
- UTF-8 encoded charachters can now be used in built-in wiki and messages
- new import for external crawling queues
- additional Shutdownmethod to run YaCy as Windows Service
- Feature Enhancements
- more templates in yacy wiki
- the yacy httpd is now able to set cookies using custom http headers
- beautification of many pages
- added majority voting for peer type decision, reduced the number of peer pings sent out
- new database handling of index entries, less io overhead
- re-organization of HTCACHE files, better file structure
- many architecture changes to enhance database speed, stability and capacity to hold new ranking parameters
- backup-option for lost assortment files and import interface for these files
- enhancements for index distribution (better selection, less blocking, bugfixes)
- security bugfixes: UserDB Passwordcheck, YBR transmission protocol path selection
- enhanced German translations
- Important Bugfixes
- database bugfixes (iteration, peer-listing)
- several thread-lockings solved
- enhanced html cleaner for better security in wiki and messages
- Memorysettings now also working for Windows
- some Filemodes were set wrong, fixed
- minor bug-fix in Cache for some rare URLs
- Translations work now with readonly htroot
v0.42_20051216_1219
- New Features
- Introduction of Block Rank; Generation, transmission and collection of block rank statistics; computation of block rank tables for new search ranking YBR (YaCy Block Rank)
- New network picture on network page
- New Connection tracking page
- New ICAP support and SQUID-like redirectors
- Many small changes
- Improvements
- New search behavior with better search results and less search time.
- Index transfer with less computation overhead.
- Better robots.txt parser.
- New write cache for less IO resulting in better database speed.
- Asynchronous queueing of crawl jobs and better crawl+indexing performance
- More document parser. The following document formats are supported:
- Acrobat Portable Document
- Word Document
- MimeType
- Rich Site Summary/Atom Feed
- OASIS OpenDocument V2 Text Document
- Bzip 2 UNIX Compressed File
- GNU Zip Compressed Archive
- Rich Text Format
- Tape Archive File
- rpm Parser
- Compressed Archive File
- vCard
- very large number of bugfixes
v0.41_20051004_848
- Prevention of unwanted DDoS effects caused by YaCy crawls by doing a target-server load balancing; further prevention is done by robots.txt
- New setting of proxy cache size and storage path in submenu 'Proxy Indexing'
- Modified cleanup of HTCache
- Sorted config list in Config_p.html and sorted file list in Cache Admin menu
- Code clean-up: added finals and constants
- New Network menu design; showing peer status, Index Receive and Crawl Receive properties as images
- Added ICAP support; now an experimental ICAP Server is embedded and Yacy allows other proxies to use the indexing service via icap response modification requests
- Single-Peer permanent index transfer (full flush to other peer)
- Added a templateCache to httpFileHandler
- Added blacklist support for https requests and crawler
- Adding functionality to delete entries from Indexing and Crawler Queue
- Adding functionality to clear the whole indexing queue
- Proxy now supports the X-Forwarded-For Header
- Indexing queue now displays total size of enqueued content in kb
- Remembering Crawler-isPaused setting by storing status into config file
- Splitting of status page into a private and a public accessible part
- Adding Queue overview to status page
- New symbols for Peer Status, connection status, Index-Receive-Granted and Crawl-Receive-Granted in Network menu, replacing old separate columns
- Support for robots.txt
- Implementation of a robots.txt parser
- Control of remote crawls with robots.txt
- Performance enhancements
- Better Database Caching
- Better usage of memory in kelondro Record-Nodes and less IO access
- New cache-control menu within the performance menu
- Better cache-size default values
- Accelerated Blacklists import; makes big lists possible
- Content-Encoding GZIP support for http post requests on index transfer/distribution
- Bugfixes
- Normalization problems: prevent URLs with ':80'
- Many bug fixes for NULL pointer occurrences
- Display of an proxy error page instead of a white page if the server has closed the connection before yacy was able to receive the http response line
- Crawler Redirection bug fixed
- Indexer now gets the mimeType now from the parsed document instead of the responseHeader (this is especially necessary if mimeType has to be detected by the MimeType parser)
- URLs pointing to a server having a private ip addess will not be indexed anymore
- Unsupported MimeTypes and fileExtensions will not be queued by the cachemanager in the indexer queue anymore (to reduce unneccesary IO)
- .. many more small changes and bugfixes. for details please see the SVN history
v0.40_20050816_548
- Index distribution to DHT now fully active
- All peers, principal, senior and junior peers distribute their index to other principal or senior peers. Juniors send their index to only one principal/senior peer, while principal/senior peers distribute to 3 redundant other peers.
- The index distribution and the index receive flag on Index-Control must be set to enable global search on the own peer
- For a global search, only relevant peers according to DHT rules are selected
- New YaCyNews feature
- A new menu 'News' shows the news processing queues for incoming and outgoing news. However, news shall not be monitored here but they influence the presentation of other information throughout the system:
- The Index-Create menu now shows a list of previously started crawls of other peers. They are distinguished between Crawls in progress and finished crawls.
- When a Crawl in the Index-Create menu is started with the 'RemoteIndexing' - flag set on, then automatically a YaCyNews is generated to inform other peers about that crawl start. A message can be attached to explain why this crawl was startet.
- When a personal profile is changed, a News Message is generated.
- When a Wiki entry is changed, a Message is generated.
- Within the Network menu, Alerts for Profile Updates, Wiki Updates and Cralws in Progress may appear.
- The YaCy wiki has enhanced and was moved out from the 'Lab' to the main menu. The Wiki-System now supports embedded images and can show a preview.
- New logging policy: logs are now exclusively written to the rotating logs in /log/
- Search was enhanced using the intermission feature (all other processes are paused while search is in progress)
- Enhancements to Translation/Localization
- Proxy-Caution-Delay (forced idle time of the crawler after a proxy access) is now configurable in the Performance menue
- Many bugfixes for time-out problems, database crashes, DHT management, version numbers
v0.39_20050722_424
- New Features:
- Added snippets to search results. Snippets are fetched by searching peer from original web sites and are also transported during result transmission from remote search results.
- Proxy shows now an error page in case of errors.
- Preparation for localization: started (not finished) German translation
- Status page shows now memory amount, transfer volume and indexing speed as PPM (pages per minute). A global PPM (sum over all peers) is also computed.
- Re-Structuring of Index-Creation Menue: added more submenues and queue monitors
- Added feature to start crawling on bookmark files
- Added blocking of blacklistet URLs in indexReceive (remote DHT index transmissions)
- Added port forwarding for remote peer connections (the peer may now be connected to an configurable address)
- Added bbCode for Profiles
- Memory Management in Performance Menu: a memory-limit can be set as condition for queue execution.
- Added option to do performance-limited remote crawls (use this instead to switch off remote indexing if you are scared about too much performance loss on your machine)
- Enhanced logging, configuration with yacy.logging
- Performance: enhanced indexing speed
- Implemented indexing/loading multithreading
- Enhanced caching in database (less memory occupation)
- Replaced RAM-queue after indexing by a file-based queue (makes long queues possible)
- Changed assortment cache-flush procedure: words may now appear in any assortment, not only one assortment. This prevents assortment-flushes, increases the capacity and prevents creation of files in DATA/PLASMADB/WORDS, which further speeds up indexing.
- Speed-up of start-up and shut-down by replacement of stack by array. The dumped index takes also less memory on disk now. Because dumping is faster, the cache may be bigger which also increases indexing speed.
- Bugfixes:
- Better shut-down behavior, time-out on sockets, less exceptions
- Fixed gzip decoding and content-length in http-client
- Better httpd header validation
- Fixed possible memory leaks
- Fixed 100% CPU bug (caused by repeated GC when memory was low)
- Fixed UTF8-decoding for parser
v0.38_20050603_208
- Enhanced Crawling:
- There are now 3 different crawl threads: local crawling, global crawl trigger and remote-triggered crawl jobs.
- The thread pools can now be configured through the Performance-Menu and a customized number of crawling threads is possible.
- Crawling can be paused and resumed.
- Changed method of index caching; this speeds up crawling and provides a more economic data structure.
- Enhanced Proxy: added transparent proxy support. It is now possible to route http traffic through yacy without setting a proxy configuration in browsers. Example: set your iptables configuration with
iptables -t nat -A PREROUTING -p tcp -s 192.168.0.0/16 --dport 80 -j DNAT --to 192.168.0.1:8080
- Extended seed-upload methods for principal peers: more configuration options, better extensibility. Added support for scp.
- More external parsers. YaCy now supports tar, zip, gzip, bzip, rss, rtf, pdf, doc. To use these parsers, an additional libx-library must be installed which comes separately to the YaCy core distribution.
- Enhanced Shutdown procedure: many unnesessary threads had been removed, a shutdown hook had been added. Missing file closings hat been added. The new index caching method flushes the cache faster.
- Added support for localization: it is now possible to extend YaCy with localization data; added languages can be accessed with the new Language-Menu
v0.37_build20050502
- YaCy's source code is now hosted in a Subversion/svn version control system on developer.berlios.de: yacy@berlios.de
- overall speed enhancements:
- new Thread-Pools and performance enhancements from Martin Thelian: much faster http-server and more responsive web interface
- fixed bug in database caching that prevented from caching at all; now database much faster. This also speeded up proxy mode (must read http-header from database)
- modified thread control for non-blocking dequeueing
- increased cache memory settings
- added a concept for external parsers; pdf an doc parser are integrated but not active yet.
- fixed several bugs that caused thread-locks and 100% CPU load
- fixed bug with cookie storage; changed handling of multiple cookies
- fixed brute-force password attack denial
- check on new peer names: must not occur already and may only contain letters, numbers and '_' or '-'.
- many minor bug fixes and spell corrections in web-interface
v0.36_build20050326
- Enhanced thread control and added performance menu: this can be used to steer scheduling tasks and for profiling.
- Enhanced search result ranking.
v0.35_build20050306
- new Features
- new user-profile management and remote access of profiles through the network-page
- new cookie-monitor. Will be used to manage cookie-filter
- new template engine and re-design of many administration pages as preparation for upcoming localization
- now permanent storage of passive peers
- enabled switch-of of proxy-cache
- new proxy-indexing monitor and moved proxy-indexing configuration to that new page
- more functions to DHT-management:
- remote indexing tagets now selected by DHT rule
- remote search now selects hierarchically with DHT-rule
- enhanced access control to YaCy administration
- passwords are now encoded to MD5-Hashes before stored to httpProxy.conf
- brute-force password-hack prevention by additional delay's
- added new 'steering' servlets for automated processes that need authorization
- re-design
- re-designed main menu: new sub-menu for proxy functions
- re-design of Network Monitor page
- re-design of seed database management and implementation of seed-action interface
- fixed bugs:
- fixed a bug with cache-control
- fixed a bug with peer-list uploading
- fixed a bug that provoked indexing of YaCy's own web pages
- fixed a bug that prevented loading of some web pages: (JavaScript bug) doublequote/singlequote mixture removed
- better binary-check on files before indexing
- fixed misbehavior of Network-Page: re-design of enumeration method and auto-heal function in kelondroTree
v0.34_build20050208
- Remote transmission of index (RWI) information to other peers with correct DHT position
- implemented two new yacy-protocol - commands: yacy/transferRWI and yacy/transferURL for RWI partition transfer
- selection of DHT positions and selection of correct RWI partitions for transmission
- performing full flush of index if peer is running in junior mode: now these juniors can contribute to the global index.
- default full receive of index transmission in senior peers; these peers will currently not transfer indexes. This is a test configuration and senior2senior RWI transmission will be enabled in future releases.
- Configuration flags (grant/do not grant) in 'Index Control' menu.
- Enhanced remote search
- selelction of less result values: less traffic, faster response.
- pre-sorting of results in remote peers before transmission: better results
- more properties in seeds
- Flags for "accept remote crawls" and "accept remote indexes"
- Flags for "grant index distribution" and "grant index receive"
- Control values for received/send RWI/URL
- All flag values are shown on Network page
- Bug-fixes:
- no re-set of remote crawl delay after re-connect
- proxy fail (shows white pages) fixed: better timeout value
- local indexing = off did not work, fixed.
- auto-heal of seed.db - fail
- many minor bug fixed
- new german forum at http://www.yacy-forum.de, provided by Roland Ramthun
v0.33_build20050107
- Support for Stop-Words; default stopwords are included; stopwords are excluded for indexing and in search query results
- Skin support
- New start/stop-script for unix/linux daemon init process
- File-Share entries can now have description entries
- Enhanced File-Sharing Menu
- Every entry can have a comment attached
- Comments or picture preview visible in file list
- File name and comment field can be indexed and globaly searched
- Files found with search interface are dynamically linked to the actual IP of the peer hosting the file
v0.32_build20041221
- New Crawling-Profiles for Crawl-Threads
- every crawl start now defines it's own crawl job; new crawls do not interfere with previously started and still running jobs; all started jobs may run concurrently
- new crawl properties: accept urls containing '?'; flag for storage of pages in proxy cache; flags for local and remote indexing
- New Design, new documentation, new mascot 'Kaskelix' (appears on search page), new home page location http://www.yacy.net/yacy
- Promotion-String on search page
- New shutdown-trigger (no more file polling, new stop scripts)
- Principal-peer gaining after file generation
- New 'Log'-menu: view the application log on the web interface
- Bug-fixes
- Termination process should succeed now.
- Cross-Site-Scripting bug removed
- Removed deadlock occurred during concurrent crawl job starts
v0.31_build20041209
- Integrated url filter for crawl jobs (Index Creation - page) and search requests (Search Page).
- Removed a bug that caused sudden termination when a not-valid url was crawled.
- Massively enhanced indexing speed by implementation of an additional word index cache.
- Added button to delete/empty the crawl url stack.
- Many minor changes.
v0.30_build20041125
- Implemented Remote Crawling
- Every Senior and Principal Peer may now start Remote Crawls: The initiating peer starts with the crawl and may assign URLs to qualified other peers. Those peers load the assigned resource, index them and return the index statistics back to the initiator. Executing peer may only be a Senior or Principal peer.
- Extended URL management: URLs are now organized in three different sets: Noticed-URLs (not loaded but possibly queued for crawling), Error-URLs (not loaded but may be re-loaded to avoid index loss in case of temporary target server downtime or network problems) and Loaded-URLs. The Loaded-URLs are again divided into six categories:
- remote index: retrieved by other peers
- partly remote/local index: result of search queries
- partly remote/local index: result of index transfer (to be implemented soon)
- local index: result of proxy fetch/prefetch
- local index: result of local crawling
- local index: result of remote crawling requests
- New monitoring pages: Local Index Monitor for results of LURL's (see above), cases 1-5 and the Global Index Monitor for case 6. Because the results of global crawls are not personal to the peer owner, the monitor page is not protected.
- Options to allow or disallow remote crawling; either as initiating or executing peer.
- Idle/Due-Time - management for each peer: to organize remote-crawl load-balancing, a delay time is used to schedule remote crawls. The seed management was extended to store and maintain these delay times.
- Proxy Performance Enhancements
- changed+enhanced caching algorithm; re-implemented routines
- process enhancements in httpc and httpd classes
- gzip-load mode in httpc fixed
- removed DNS bottleneck (the java DNS blocks while accessed simultanously)
- integrated DNS-prefetch
- Implemented Shut-Down Procedure
- Integrated notifier procedure in all threads.
- The application now creates a file 'yacyProxy.control' after start-up.
- To stop the yacyProxy, remove the control file.
- Integrated a 'Shutdown' - button on the 'Status'-page which also triggers shut-down
- After shut-down is initiated the application first processes all scheduled crawling- and indexing tasks which may last some minutes in the worst case.
- Removed bugs
- URL normalization
- many minor bugs
v0.29_build20041022
- New option to start explicit crawling jobs: a start url and a crawling depth
(differently from the prefetch depth) can be set.
- Integrated monitoring interface for prefetch/crawling activities.
The user can now observe the crawling and indexing activity in detail.
There is also a report page that lists all newly indexed pages with the option
to delete these indexes again. The interface also reports the initiator
of the crawling/indexing tasks which can be currently either the prefetch mechanism
or explicit crawling requests. In future releases the initiator may also refer to
remote crawling requests.
- New caching procedure for database requests on file-system level.
- Extended blacklist url matching: parts of a domain may now be matched with wildcards '*'. (the URL's path may be matched with regular expressions)
- The application will be re-named. Many parts now refer to the new application name 'yacy', but not all.
v0.28_build20041001
- Search results are now searched again for characteristic word patterns.
The patterns are statistically evaluated and are used to generate
"search associations",
shown as hints for further combined search.
- Parallelized peer propagation process. This results in very rapid bootstraping.
- Integrated new 'score' library for rapid element sorting - used for search
patterns and rapid bootstraping. May help in future releases to speed up indexing.
- Minor bug-fixes.
v0.27_build20040924
- Bug fix in remote search result preparation.
- Speed enhancements on search client when doing remote search.
- Small changes in file sharing interface.
v0.26_build20040916
- Introduced new 'virtual' TLD (top-level-domain) '.yacy' that the proxy resolves into the peers IP and port numbers:
- Every yacy-peer can now be contacted using the peer's name as domain name:
Proxies users can obtain any other proxy-hosted pages using the url 'http://<peer-name>.yacy'.
- Implemented sub-level domains for yacy TLD's: they are matched to subdirectories of the peer's individual web root HTDOCS. (see below)
- Support for individual web pages:
- Every proxy host can serve it's individual web page. We implemented two paths for each server: one default path pointing to <application-root>/htroot for administrative pages and an alternative path for individual use at <application-root>/DATA/HTDOCS.
- The individual web pages may be accessed either using the new '.yacy' TLD's through another proxy, or optionaly by using the peer's IP:port address. The recommended default address of a proxy is 'http://www.<peer-name>.yacy', which is mapped to <application-root>/DATA/HTDOCS/www/.
- Integrated an upload/download interface for individial web pages: additional accounts for uploaders and downloaders ensure appropriate authorization. The file-sharing web space can be browsed with an directory servlet. A default sub-domain is assigned to 'http://share.[peer-name].yacy', which is mapped into <application-root>/DATA/HTDOCS/share/.
- Web clients not using the proxy may contact the new individual default subdomains using the URLs http://<peer-IP>:<peer-port>/www/ and http://<peer-IP>:<peer-port>/share/.
- Several Bug-fixes:
- Date bug appearing when accessing the proxy httpd with the proxy.
- Additional Time-out catch-up at httpc when a file is submitted without length tag. Also extended general retrieve - time.out.
- Terminal line restriction of 1000 bytes was too tight (cookies may have 4kb length).
- Introduced global general time measurement for peer synchronization and balanced hello - round-robin.
- Enhanced proxy-proxy - mode: 'no-proxy' settings as list of patterns to exceptional dis-allow usage of remote proxies.
- Implemented multiple default-paths for URLs pointing to directories.
- Re-design of front-end menu structure.
- Integrated Interface for principal configuration in Settings page
- Re-named the release: integrated YACY name to emphasize that the proxy is mutating into the YACY Search Engine Node
v0.25_build20040822
- New Index Administration Menu Item: RWI's (Reverse Word Indexes) may now be inspected.
Each reference in a word index can be displayed in detail, and optionally be deleted.
- Minor bug fixes in Bootstraping. Major Bug fixes in Index Storage (better Normal Form of URLs).
- Better display of cache content in the Cache Administration.
v0.24_build20040816
- New 'Cache' Menu item: The proxy cache can now be inspected. It shows a directory list with http response headers and content to each file in the proxy cache.
- Faster Bootstraping: The connection policy was changed: as long as the proxy status is 'virgin', the most recent known connection is used for bootstraping; then later the least recent for peer distribution.
- Better Formatting in Network Menu.
v0.23_build20040808
- Blacklists now provide management of several lists and more import options.
- code cleanup + many minor bugs
- Messages now work (corrected POST implementation, this also cleaned the way to index distribution); improved message sending, displaying etc.
- double links / unchecked '#', headlines wrong
- httpd-speedup (no more temporary files, template prefetch without double-load)
- much better Bootstraping and more intelligent yacy-peer updating
- auto-migration of new settings from httpProxy.init
- much better logging; extensive log configuration options for all parts of the application now in httpProxy.init
- better search requesting (more results)
- yacy protocol may now also use other proxies in proxy-proxy-mode
- more documentation
- permanent demo-page at yacy.net/home.html with wiki
- new FAQ at http://www.yacy.net/yacy/FAQ.html
- first step to move YACY to new home http://sourceforge.net/projects/yacy/
v0.22_build20040711
- More security bug fixes (dementia accountia, '..' usage in server path, server blacklist too tight for local clients)
- Another advance in better peer distribution and recognition (distinguishes between 'real' disconnected peers and 'hearsay' disconnected peers. Keeps track of online time. No preferences of principal peers in link distribution)
- An option to switch the peer to online mode without using the proxy. This makes life much easier for newbie's.
- A new message function. Within the Network page, one can hit the 'm' and may then send a message to the other peer. The owner of that
peer can read the message in his/her private message inbox. This function is only in alpha statdium; it works only in rare cases and
we don't know why. Only for testing.
- Cleaned up the mess of different database and configuration files. All run-time data is now accumulated in the new folder 'DATA'. If you previously generated an index and want to migrate, you simply need to put your old PLASMADB folder into the new DATA folder.
- Clean up of the source mess and partition of them into separate packages
- Some design enhancements of the online interface
v0.21_build20040627
After an announcement on freshmeat.net we got many hits in the newly build p2p-network. We learned from the p2p-propagation behavior and
implemented a lot of new routines to stabilize the YACY network.
- Better peer analysis, statistics, propagation/distribution (more properties and bug fixes).
- No more JavaScript in online Interface. New template logic for httpd and new online interface look-and-feel, using the new features.
- New FAQ in documentation.
- Protection against hacker and virus attacks: new self-configuring client-IP blocking in serverCore.java
- More information and warnings about security settings to the operator to protect the own peer
- Network statistics and monitor shows status of remote peers and the distributed index
v0.20_build20040614
The first step into the p2p-world: introduction the YACY (yet another cyberspace) p2p network propagation and information wares distribution system. YACY enables in this release a rudimentary index exchange so that you can use YACY to bootstrap a world-wide distributed search engine.
- Added status page on web interface and automatic opening of web browser on status page. Can be switched off on the satus page.
- Implemented still missing element removal and AVL balancing for element insterts in the kelondro database. This ensures logarithmic efforts on database access, which influences the proxy and the search service. Now only AVL balancing after removal must be implented, but it's missing is not critical.
- Added blacklist enhancements and web interface for blacklist editing from Alexander Schier.
- More and better documentation.
- Many minor bug fixes, i.e. non-cacheabilty of web interface, exception catch-up on startup when proxy is used before coloured lists are loaded.
- First p2p elements implemented: every peer on startup looks for other peers and announces it's own startup. The function does not yet actively implement an index exchange, but can repond to remote index queries.
v0.16_build20040503
This release is a major step to make the proxy enterprise-ready: we introduced several security mechanism and access
restrictions for the proxy and the server. Every security setting can be configured through a web page. Thanks to the new
HTTPS proxy, the proxy can now be considered as 'complete'.
- implemented a HTTPS proxy, sharing the same proxy port with http;
this does not help for more/better indexing since the SSL data is simply passed through.
But we can now state to be a 'full' http and https proxy, usable in enterprise environments and internet cafe's.
- two security layers for web server and proxy access: implemented Client-IP - filtering, which adds a virtual Firewall to
the application. Every client that does not match the client-IP-filter is blocked. The second layer is a PROXY password
protection. All attributes can be configured through a new web page at http://localhost:8080 (standard configuration).
- to protect the configuration pages of the web server, we introduced a password protection for special pages on the web server.
Every page that ends with '_p.html' has a protection; the corresponding account can also be set through the local web server.
Users shall be encouraged to set this administration account first.
v0.15_build20040318
- Extensive code re-engineering
- Inserted and further generalized the proxy's genericServer into the AnomicFTPD project. After further enhancements within that project,
it was re-inserted in the HTTPProxy. The Switchboard interface now belongs to the genericServer, which is now called the serverCore.
- Removed the old html parser and replaced it by the new htmlFilter library, which now parses the html files during reading from
the remote server. Real-time parsing during streaming html pages is done extremely fast and does not slow down file passing through
the proxy. The new htmlFile provides a filter interface, which is now used to filter out content that is defined by keywords.
Currently the bluelist 'httpProxy.blue' is used to define these words.
- Re-engineered the crawler interface and implemented a crawler. Since the crawler does not work in all cases, it is still
disabled in this release. You can switch it on by setting the prefetchDepth in the configuration file httpProxy.init
- Implemented a 304 response. This speeds up all responses in the case of a cache hit combined with a conditional request.
Since this combination is fairly common, it noticeable speeds up the proxy.
- New documentation design
- New Search Page design
v0.14-build20040213
- More Structure to the whole system to lay the basis for the Crawler
- The new structure will distinguish between the httpd with it's servlets, the file-servlet and proxy-servlet;
the crawler which also holds responsibility for the http cache that is used by the http proxy and the indexing
engine 'PLASMA', which is again accessed by the http file server. But even with the crawler concept on board here, we still don't have prefetch now.
- Moved plasmaTextProvider to httpCrawlerScraper, httpdProxyCache to httpCrawlerCache and httpdSwitchboard to httpCrawlerSwitchboard
- New configuration value proxyCacheSize: limits the memory amount of the cache; if the cache exceeds this value the oldest entries are deleted
- Bug fixes:
- Found and eliminated nasty bug that prevented using yahoo mail. (they send several cookies at once)
- No more indexing of URLs with 'cgi' in name or ending with '.js', '.ico', or '.css' (checking content-type for 'text' is not enough; some servers do not transfer right value)
- Fixed search for words containing numbers and german Umlaute
- adopted acrypt.java to no using javax.crypt, this was not supported by debian blackdown java 1.3.1. Furthermore, removed -server - flag from httpProxy.sh, that also made blackdown to crash. (you probably want to insert that flag again in your installation)
- The proxy can now be configured to access another proxy
v0.13-build20040210
- Bug fixes:
- removed forced unzipping for special cases: either if the file to be transported is 'naturally' in gzip format (.gz, .tgz and .zip) or if zipping would not make sense because it would not yield any compression, as for images. Now the 'Accept-Encoding', created by the browser and send to the server has omitted gzip attributes in this cases. This should lead to less overhead (no gzip en/de-coding) and thus to more speed.
- now transport of httpc failure response body (especially 404; seemed to be unneccesary, but is not)
- search result bug (mixed up appearence) removed
- Performance and structure enhancements:
- Extended database capabilities to hold content of dynamic size; new files kelondroRA.java, kelondroAbstractRA.java, kelondroFileRA.java, kelondroDyn.java
- Used new database features to store the response header information for all files in the cache into one database file. This saves 50% of the number of files in the cache (no more need for the .header - files)
- Implemented a scheduling that moved the time of cache creation into an proxy-idle - time. This reduces the file operation on a single user system by 50% during web page retrievement.
v0.12-build20040204
- now a release roadmap exists
- enhanced proxy and caching:
- integrated blacklist 'httpProxy.black' idea and data from Alexander Schier: forced 404 response for blacklisted hosts. This can be used to 'switch off' specific domains, especially AGIS servers. Can also be used for child protection/parental control. Does not filter content!
- cache write bug if same file and directory name is used (can be done in URL, but not in cache file system) removed.
- detailed 404 debugging response in case of failure
- new config value maxSessions for limit the number of concurrent connections to the proxy
- Host property bug in httpc for HTTP/1.1 servers removed: now better access to more servers
- enhanced indexing and searching:
- implemented rudimentary ranking and ordering of search results either by quality or by date
- implemented bluelist 'httpProxy.blue': filtering of all blue-listed words in search expression, result-url and result-description
- bugfix for combined search, fixed date attached to search results
- first contact with Gnugle project and knowledge exchange
v0.11-build20040124
- non-empty field servlet bug in index.java
- greatly enhanced indexing
- better structure: new classes plasmaIndexEntry, plasmaSearch, plasmaIndex, plasmaIndexCache, plasmaURL
- index entry caching and transparent flushing implemented
- catch-up of sleeping connections, enhanced idle check in genericServer.java
v0.1-build20040119
- first time published on www.anomic.de!
- client user agent forwarding according to 'yellow'-list
- plasma database
- new database sub-path DATABASE
- new file kelondroRecords.java + kelondroTree.java
- plasmaStore now saves and retrieves transparently urls in the kelondro database
- no more XSUMP path, was not necessary for condenser; url attributes will be stored in new DB
- indexing implemented; still imperformant since that needs caching (later)
- rudimentary index access through new web page index.html and servlet index.java
- better client timeout -> better idle check -> no job queue blocking
- new interface genericServerHandler.java
build20040110
- blackboard as global configuration set for all threads with global function scheduler
- new files httpdSwitchboard.java and plasmaSwitchboard with job control and global config
- new file plasmaStore.java
- the plasma blackboard saves its data into plasmaBlackboard.conf
- cgi control over blackboard (try http://proxy/test.html)
- new test file HTROOT/switchboard.[html,java]; try http://proxy/switchboard.html
- condensement on cache (indexing pre-process)
- renamed file htmlParser to plasmaTextProvider
- new file plasmaCondenser.java
- test output of word list per page
build20040107
- better/more configuration
- moved httpd.conf to httpProxy.conf
- new loglevel attribute for server and proxy in httpProxy.conf
- new clientTimeout attribute for client-proxy connections in httpProxy.conf
- advanced cache-control
- transparent gunzip upon loading of gzip-encoded streams in httpc, all cache files are now unzipped
- much better cache control, according to RFC standards and recommendations, really usable now
- implemented scheduler
- idle check in genericServer as scheduler trigger
- new experimental scheduler in httpProxy for new cache arrivals
- added acrypt.java for different encoding tools
- added htmlParser.java and implemented scheduled parsing of selected html resources
- a subdirectory XSUMP is now filled with to-be-indexed text files
build20040105
- advanced header transport transparency
- added CaseInsensitiveMap.java, a TreeMap with case-insensitive comparator
- management of reverse mapping of header symbols
- better handling of cookies (the yahoo-bug was attacked, but still not eliminated)
- advanced error case behavior
- implemented 404 response when server or files unreachable
- fixed behavior when file download is interrupted and broken file is in cache
- fixed session termination bug
- advanced cache behavior
- fixed loading of stale files from cache in some cases
- 203 response instead 200 for files that come from cache
build20031229
- minor bugs ("#!/bin/sh" in shell scripts; +%Y instead +%y; 755, 644 acc rigths)
- major changes in caching load/transport
- added wishlist.txt
- added changelog.txt
- implemented automatic webinterface access if no host given
- extended httpd.java to access FileServlets from httpdFileServlet.java
- file servlet with examples, parameter hand-over via get and post, text and multipart
- added GPL lib 'Template.java' from JavaBY Template Engine from Alexey Popov
- made changes in Template.java
- added httpdFileServlet.java and classProvider.java to implement template-based CGI file serving
- added subdir HTROOT with example files test.{java,http}
build20031218
- first public release of YACY as AnomicHTTPProxy_20031218.tar.gz
- basic httpd proxy functions only
- first alpha-tester Alexander
build20031215
|