boxtec/yacy_search_server - yacy_search_server - BOXTEC Source Repository

Author	SHA1	Message	Date
Michael Peter Christen	d9603039ff	automatically set the Q flag for smb/ftp start urls (split pdf support)	10 years ago
Michael Peter Christen	9fce8bf2a5	crawling of multi-page pdfs with artificial post part on smb or ftp shares is not possible with the disabled setting; this is not temporary disabled until a better solution is on the hand.	10 years ago
orbiter	4177c9cf05	fix for crawl start check	11 years ago
orbiter	f8b8c82421	- refactoring of getpageinfo_p.xml (moved out of util) - added more logging in getpageinfo_p.xml git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	ff32469272	added a link to /api/util/getpageinfo_p.xml as API to crawl start info and to ViewFile.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8035 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c36da90261	added a very fast ftp file list generator to site crawler: - when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once - the harvester runs concurrently and feeds into the normal crawl queue also in this: - fixed the 'start from file' crawl function - added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags. - this causes that a crawl start is now also possible from clear text link files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
mikeworks	70576e88d2	de.lng: Added some more untranslated strings I found and uncommented old ones that were removed terminal_p.html: Put back the old ID which was really easy to find IndexCreate.js: Because XHTML 1.0 Strict does not allow name tags for some elements rewrote most element access functions to use getElementById Table_API_p.html and all other html pages: Some XHTMl 1.0 Strict fixes, changed checkAll javascript, marked the first row with checkboxes as unsortable where applicable Table_API_p.java and all other java pages: URLencoded lines with possible ampersands & -> & for validation XHTML 1.0 Strict sourcecode --> All Index Create pages should validate now. Hope I did not break anything else (too much :-) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7225 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2c549ae341	fixed a number of small bugs: - better crawl star for files paths and smb paths - added time-out wrapper for dns resolving and reverse resolving to prevent blockings - fixed intranet scanner result list check boxes - prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available) - fixed rss feed loader - fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero) - clearing of crawl result lists when a network switch was done - higher maximum file size for crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f6eebb6f99	replaced auto-dom filter with easy-to-understand Site Link-List crawler option - nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler - nobody ever used the auto-dom filter other than with a crawl depth of 1 - the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste - the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain - the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used - the new option also fits into the easy site-crawl start menu git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	d126d6c1b5	renamed the servlet WatchCrawler_p to Crawler_p this was done because that servlet may be used for wget/cronjob triggered crawl starts and it appears to be confusing that the name of the crawl start servlet looks like a pure monitoring tool. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6568 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	c6c97f23ad	- added cache usage properties to crawl start - added special rule to balancer to omit forced delays if cache is used exclusively - extended the htCache size by default to 32GB git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	187ee4d06e	another IE fix (also same names in html and js) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6116 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6663365720	adopted many calls to new api path git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5498 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	8d1bedfc3a	- added bookmarkTitle to CrawlStart_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5068 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	76eac114ed	* define global javascript-variable with var to get rid of warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4624 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
theli	e75ca857c3	*) Bugfix for problem with ajax graphic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3815 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a3ecfe0a45	replaced failed-icon by new 'bad'-icon git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3680 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	6f46245a51	) Bookmarks: Ajax icon is displayed while loading title ) First version of a sitemap parser added - currently only autodetection of sitemap files is supported *) DB-Import restructured - pause/resume should work again now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	91b78d9f04	missing File for IndexCreate git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1694 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago