yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
reger	ff18129def	ViewFile servlet: update index if newer, so viewed text and metadata (stored) info is similar - to archive it, use request with profile to allow indexing (defaultglobaltext) and update index (the resource is loaded, parsed anyway, so it's not a expensive operation) Request: remove 2 unused init parameter - number of anchors of the parent - forkfactor sum of anchors of all ancestors	10 years ago
Michael Peter Christen	97f6089a41	YaCy can now create web page snapshots as pdf documents which can later be transcoded into jpg for image previews. To create such pdfs you must do: Add wkhtmltopdf and imagemagick to your OS, which you can do: On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from http://wkhtmltopdf.org/downloads.html and downloadh ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip In Debian do "apt-get install wkhtmltopdf imagemagick" Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and "Always Fresh" - this is used by wkhtmltopdf to fetch web pages using the YaCy proxy. Using "Always Fresh" it is possible to get all pages from the proxy cache. Finally, you will see a new option when starting an expert web crawl. You can set a maximum depth for crawling which should cause a pdf generation. The resulting pdfs are then available in DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf	10 years ago
Michael Peter Christen	ba6ffddefc	refactoring	11 years ago
Michael Peter Christen	b08375da33	fix for bad/missing values of size_i	11 years ago
Michael Peter Christen	7005ecdabd	cleanup	11 years ago
reger	4c38bceafc	handle http connect for proxy refactor header cleanup (reuse existing code)	11 years ago
reger	bc6ebb3c06	adjust to DigestURI changes from master to DigestURL	11 years ago
reger	f7f86d8a5d	update to Jetty 9 jars - include javax.servlet 3.0	11 years ago
reger	105cf8f593	changes to adjust jetty to recent code changes	11 years ago
Michael Peter Christen	65f56b1fd4	Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Conflicts: .classpath build.xml htroot/Status.java source/de/anomic/http/server/HTTPDProxyHandler.java source/net/yacy/yacy.java	13 years ago
Florian Richter	965aac5ebb	* proxy works almost	14 years ago
Florian Richter	13724ddd43	* caching in proxy	14 years ago

13 Commits (572cfe8fd4ba9d51a1a4df29fc8468a9f705107b)