Commit Graph

24 Commits (d53c33e4ef0f867e92f7b64ab001250dcfb60076)

Author SHA1 Message Date
luccioman cced94298a Added a new crawler document filter type using Solr syntax
7 years ago
luccioman fb3032c530 Added a crawl filtering possibility on documents Media Type (MIME)
7 years ago
Michael Peter Christen 187075b878 added nav filter
7 years ago
luccioman 7c644090ff Fixed CrawlStartExpert.html HTML validation errors
7 years ago
luccioman 519fc9a600 Issue #156 : new option to clean up (or not) search cache on crawl start
7 years ago
luccioman eb20589e29 Fixed issue #158 : completed div CSS class ignore in crawl
7 years ago
luccioman 79a2ba306a Updated links to Java Regular Expressions documentation to version 8
7 years ago
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
7 years ago
luccioman 0f80c978d6 Limit the number of initially previewed links in crawl start pages.
8 years ago
luccioman 62f75417ef Updated Pattern JavaDoc links to current minimum (1.7) JDK version.
8 years ago
luccioman 812abfc868 Converted one more set of URLs to pure relative ones.
8 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen 1309619a71 remove remote indexing option in crawl start if not in p2p mode
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
Michael Peter Christen 6f0167fac1 get cloned crawl start parameter for snapshots
10 years ago
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
10 years ago
orbiter f642cfbe30 added hint to the regular expression tester
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen 1b279d7a7e fixed external link
11 years ago
reger 89e2c5e884 fix: allow enable of CrawlStartExpert.html #file
11 years ago
Michael Peter Christen a2fba6584f use submitted default userAgent if cloning a crawl
11 years ago
orbiter d29b6db270 made crawl start pages public since they do not reveal individual
11 years ago