yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	70dd26ec95	added the new crawl scheduling function to the crawl start menu: - the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected - removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis - since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters - removed the busy thread that was used to trigger the bookmark-based scheduler - removed the crontab for the bookmark-based scheduler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
mikeworks	0f3a3e34e1	Updated German translation de.lng and fixed typos in html files (english) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6915 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
mikeworks	d915deaa2b	Fixed type in CrawlStart_p.html Changed to German language file: - Updated Crawl Start Page - Added section for indexing MediaWikis - Fixed some more start and end tags so that syntax highlighting works correctly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6617 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	d126d6c1b5	renamed the servlet WatchCrawler_p to Crawler_p this was done because that servlet may be used for wget/cronjob triggered crawl starts and it appears to be confusing that the name of the crawl start servlet looks like a pure monitoring tool. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6568 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	c0e17de2fb	- fixes for some problems with the new crawling/caching strategies - speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer - fixed some deadlock- and 100% CPU problems in the balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c6c97f23ad	- added cache usage properties to crawl start - added special rule to balancer to omit forced delays if cache is used exclusively - extended the htCache size by default to 32GB git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	94f3d90af2	added a hint about regular expressions in crawl start git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6021 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	9ab009b16b	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1890#p13476 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5755 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
auron_x	03a16f6c20	- more XHTML-validation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5580 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	7e011de34e	hint for recrawls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5537 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	10f5ec1040	reverted last commit (more testing needed) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dba7ef5144	extended crawling constraints: - removed never-used secondary crawl depth - added a must-not-match filter that can be used to exclude urls from a crawl - added stub for crawl tags which will be used to identify search results that had been produced from specific crawls please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'. Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	4745e89451	auto-choose crawl type git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5331 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	47f0c3b002	replaced the cacheAdmin with the ViewFile servlet, because the cacheAdmin was an interface to the old HTCACHE data structure which does not exist any more. Changed links to point to the ViewFile servlets. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5289 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7b35d54c6c	fixed some problems with network switching (was not completely 'clean') git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5200 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
daburna	992635c074	translation update git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5107 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	8d1bedfc3a	- added bookmarkTitle to CrawlStart_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5068 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	e1574fe02e	- added autoReCrawl folders to bookmarks (DATA/SETTINGS/autoReCrawl.conf) - the serverBusyThread checks folders every 60 min. (==> autoReCrawl_idlesleep in yacy.conf) - added option to create bookmarks from CrawlStart URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5033 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	8e179f6588	removed option to do a re-crawl with a period of minutes. Such a short time does not make sense and it may cause endless indexing loops. The removing of the option will ensure that a misuse is prevented. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4964 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	40d7f485f3	- fixed several NPE bugs - fixed loosing of own seed hash (hopefully) - fixed a bug with crawl start s beginning with (bookmark) files - added better IP recognition during hello process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4882 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2f381b8d7a	- fixed at least two causes for a NPE after a use case switch. A large refactoring was neccessary - added another crawl start option: automatic restriction to sub-path - removed crawlStartSimple and renamed crawl start expert to crawl start (without expert) - some changes to texts in crawl start - added some more deletions when an web index is deleted: delete also queues and robots cache git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4881 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago

21 Commits (6b06e94c8c83a8081eab4cfdfb37d64ec6aaa87b)