Commit Graph

189 Commits (60dc1241a3e69561f28689c6cdfc0ba8a76c1939)

Author SHA1 Message Date
orbiter 43e1660512 fix/enhancement in Crawler: do not generate domain match pattern if crawl depth is 0
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 ae10ed5613 *) added a Set to which filter elements are written before mustmatch-filter is created to avoid huge lists of double elements in mustmatch-filter when starting a crawl from a "Link-List of URL" on CrawlStartSite_p.html
14 years ago
orbiter c93f4dda72 - cleaned up yacy news
14 years ago
orbiter 0769f4caa6 added search suggestions for interactive search: is only shown if there are no search results
14 years ago
orbiter 58b59f9bc8 - a collection of bug fixes and some redesign of the Scanner class
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
14 years ago
low012 e7552bd719 *) cleaning up the code a little bit
14 years ago
low012 38fdf43587 *) renamed classes according to standard Java coding conventions
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter fcd40cd30f - disabled domZones (buggy, must think about better solution)
14 years ago
orbiter c3bf17a3a1 fixed must-match filter for smb crawling
14 years ago
orbiter 574346f8ce better must-match pattern for intranet file-crawls
14 years ago
sixcooler aa6075402a smal fix for crawling from 'sitelist' at changes from 7214
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 933dc1a600 removed old rss parser (will be replaced with parser from cora package)
14 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
14 years ago
orbiter dcd01698b4 added a 'transition feature' that shall lower the barrier to move from g**gle to yacy (yes!):
15 years ago
orbiter 56ff9d5fd4 - extended news size from 512 to 1024 characters
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 90c3e5d6f6 - cleanup, removed unused imports
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 308a973503 refactoring of tables data organisation
15 years ago
orbiter ada0ce9de3 refactoring of bookmarks: there is a big performance problem in the bookmarks code and furthermore the bookmarks
15 years ago
orbiter 24060885b6 - added Tables abstraction in data.Tables.java
15 years ago
orbiter 8ce936bcdd added an api recording function: it shall be possible to record
15 years ago
orbiter 82f57f79e5 more PMD enhancements
15 years ago
orbiter d126d6c1b5 renamed the servlet WatchCrawler_p to Crawler_p
15 years ago