be transcoded into jpg for image previews. To create such pdfs you must
do:
Add wkhtmltopdf and imagemagick to your OS, which you can do:
On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from
http://wkhtmltopdf.org/downloads.html and downloadh
ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip
In Debian do "apt-get install wkhtmltopdf imagemagick"
Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and
"Always Fresh" - this is used by wkhtmltopdf to fetch web pages using
the YaCy proxy. Using "Always Fresh" it is possible to get all pages
from the proxy cache.
Finally, you will see a new option when starting an expert web crawl.
You can set a maximum depth for crawling which should cause a pdf
generation. The resulting pdfs are then available in
DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf
publicstaticfinalStringLOADPREVIEWMAXDEPTH="loadpreviewmaxdepth";// if previews shall be loaded, this is positive and denotes the maximum depth; if not this is -1
publicstaticfinalStringSNAPSHOTS_MAXDEPTH="snapshotsMaxDepth";// if previews shall be loaded, this is positive and denotes the maximum depth; if not this is -1
publicstaticfinalStringSNAPSHOTS_REPLACEOLD="snapshotsReplaceOld";// if this is set to true, only one version of a snapshot per day is stored, otherwise we store also different versions per day
this.log.info("SNAPSHOT - "+(snapshotFile==null?"could not generate snapshot for "+entry.url().toNormalform(true):"wrote "+snapshotFile+" for "+entry.url().toNormalform(true)));