- for each crawl start, there is now a flag for text and media
- the localCrawl flag is superfluous
- added new crawl profiles
- if an image search is done, only media links are crawled for the snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3100 6c8d7289-2bf4-0310-a012-ef5d649a1542
- by default, only the admin is allowed to make changes to wiki pages
- the admin may allow changes to everybody
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3019 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better synchronization
- files are only deleted if they have been in the cache for 5 minutes
- hash-path for the HTCACHE is now default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3018 6c8d7289-2bf4-0310-a012-ef5d649a1542
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.
In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
*)Updated language files to the new standard, especially German
*)Wrote language highlighting definition for Notepad++
*)Corrected News.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2685 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added switchh to show or hide surftipps
- more news contribute to surftipps
- added voting system for surftipps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2638 6c8d7289-2bf4-0310-a012-ef5d649a1542
- serverFileUtils.java:
-- adding methods to copy from stream to writer and readers to writers
-- moving httpc writeX methods into serverFileUtils class
- serverCharBuffer.java: removing inheritance from Writer class
- replacing htmlFilterOutputStream by htmlFilterWriter class which handles
content as char stream
- htmlFilterContentTransformer.java: deactivating getText mode
(still needs to be migrated to use char streams instead of byte streams)
- changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
- changes in Scraper and Transformer classes to operate on chars instead of bytes
- httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads
- moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
- the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
for indexing, the plasmaWordIndex.
The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.
The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.
Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
A new port forwarding method for upnp was added.
If this method is enabled, yacy automatically determines an UPnP
capable internet gateway and configures the gateway port forwarding
settings properly.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
Its a layer under the servlets, this means, #[page]# will be replaced by serverletcode, the rest can be set by you.
(TODO: if we use this for layout, we need to read "TITLE" from the servlet's tp, to set it outside of the servlet.)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2302 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Removed unused init value
- Set default upload value to "none", which avoids an warning which says, upload method '' would be unknown, on new installations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2295 6c8d7289-2bf4-0310-a012-ef5d649a1542
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
- upper bound can be configured via indexDistribution.dhtReceiptLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added counter for cache delete to distinguish between flush and delete
- changed some default paramenters for cache size settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2143 6c8d7289-2bf4-0310-a012-ef5d649a1542
instead of creating a new one.
Notes:
This import is done automatically on startup if the following properties
are set in the config file:
pkcs12ImportFile =
pkcs12ImportPwd =
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2139 6c8d7289-2bf4-0310-a012-ef5d649a1542
There was a misunderstanding of the meaning of these values:
this is not the time that the process may take, instead it is the time
that the proces pauses after each loop.
increased the busysleep time pause from 2 seconds to 10 seconds.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2094 6c8d7289-2bf4-0310-a012-ef5d649a1542