domainlist is saved locally, if none of the given urls in network.unit.domainlist
could be retrieved, the file from the last boot is used instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7289 6c8d7289-2bf4-0310-a012-ef5d649a1542
* crawler and search allow only urls matching one in domainlist (if list is provided)
* this may be useful to prevent dedicated networks from being "polluted"
* FilterEngine is improved Backlist-object, Blacklist may inherit from FilterEngine in the future
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7285 6c8d7289-2bf4-0310-a012-ef5d649a1542
- yacybot user agent now includes the yacy network name (not the peer name!)
- refactoring and clean-up (mostly turned tab into spaces)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7266 6c8d7289-2bf4-0310-a012-ef5d649a1542
- faster generation of index abstract compression during remote search
- less synchronization in IO record reading
- request index abstract generation only if necessary and faster time-out in remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7239 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed read cache from Records data structure because the read cache had no cache hit during search operation
- copied old read-cache class to CachedRecords and the old, now new Records class does not have the cache any more and a code review checked that data structures and synchronization is clean
- removed unnecessary synchronization from Table class during get()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7237 6c8d7289-2bf4-0310-a012-ef5d649a1542
this makes YaCy search results VERY fast for all verify=false search cases
and it enhances the search speed also for all other snippet-fetch cases.
With this change my peer performed 100 Queries Per Second (!!!) while doing 10 queries simultanously (!!!)
in an intranet index of 20000 URLs on my 16-core Mac
Check this yourself by doing:
cd bin
./searchtestmulti.sh
after finishing the run, divide 1000 by the given time per query (which is the qps for one thread)
and then multiply again by 10 (because 10 search threads has been started)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7231 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
- migrated the 'yacy' user agent to 'yacybot' in many client methods since the 'yacy' user agent is only used for the proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7199 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better control of number of running search threads
- no time-out waiting time when no ranking feeding takes place
- local search queries by a remote peer may be faster up to 300 milliseconds
- a local search may even be faster
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7176 6c8d7289-2bf4-0310-a012-ef5d649a1542
This class requests a YaCy peer remotely and produces search result objects.
The class was implemented in such a way that it is as short as possible. To get a
better integration of search results, use the cora package.
This class is fully stand-alone, it does not need any other external library other than already contained in JRE.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7173 6c8d7289-2bf4-0310-a012-ef5d649a1542