yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luc	7aa1a29e33	Return more accurate HTTP status 400 with detail message when some error occurs on ViewImage : - missing required parameters - url licence invalid	9 years ago
luc	0076f9f97d	Updated documented sample url	9 years ago
reger	c91e712178	further refactor using standard java / (one) utf-8 charset variable extending initiative of commit `9a25751850`	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
reger	e9539b1086	reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart - add filename to parameter fieldname - add filecontent to special parameter fieldname$file (some servlets use this $file parameter) fix for http://mantis.tokeek.de/view.php?id=542	9 years ago
reger	9da1712a31	increase http header EXPIRES for css and images in DefaultServlet to increase browser cache hits for not changing content	9 years ago
reger	d5fd031449	fix reading of ippattern config array in URLProxy	9 years ago
reger	b7e8358645	make use of header.getContentType where possible (mime is normalized afterwards) otherwise use header.mime() differentiated in prev. commit.	9 years ago
luc	2a67d2ba6f	Corrected error management for unsupported image formats, parsing errors, and unavailable resources : avoid logging to much Exceptions as these errors easily occur when searching images.	9 years ago
luc	1565559df8	Refactoring : extracted write InputStream method.	9 years ago
luc	07437986e7	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	97cc03ef6a	start using a template for urlproxy header It is included as iframe /proxmsg/urlproxyheader.html to allow full servlet functionallity and flexibility to display some index/meta data in future.	9 years ago
luc	4e673ffc9a	Ensure closing of InputStream even when an exception occurs.	9 years ago
luc	745e97a575	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
reger	a60b1fb6c2	differentiate api call getLocalPort() from getConfigInt()	9 years ago
luc	aa70ff4ff6	Corrected images alpha channel rendering	9 years ago
reger	367fe388b9	fix exception throw after sendError in DefaultServlet - reduce debug exception logs in crawler	9 years ago
reger	206883f80d	fix: Preserve protocol in url proxy to connect to http/https. Display warning if https target is viewed over http	9 years ago
reger	dbe2594c38	replace deprecated myPublicLocalIP() in AbstractRemoteHandler	9 years ago
sixcooler	e427efbe54	Next Try for a fix for upload-connection staying in blocked state. This was caused by reading via GZIP from close-wait connection an caused high cpu- and system-loads. Instat of implementing handling of the RedListener now I found a timelimeted 'get' "realy" solving this problem.	10 years ago
sixcooler	ef6a64b2a4	Fix for upload-connection staying in blocked state. This was caused by reading via GZIP from close-wait connection an caused high cpu- and system-loads. Solved by implementing handling of the RedListener.	10 years ago
reger	572cfe8fd4	improve character encoding for urlproxy servlet for none utf-8 pages	10 years ago
reger	6bc8a9b11e	make Quality of Service Servlet available to prioritize requests from local host This assigns priorities to incoming requests. Higher priority numbers are served before lower. (disabled by default in defaults/web.xml, uncomment or copy entry to DATA/Settings/web.xml)	10 years ago
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
Michael Peter Christen	f5a032f293	split query into filter query and text query to get better ranking results and faster results	10 years ago
reger	5f4cd8d6f5	replace deprecated getIP with getIPs in AbstractRemoteHandler	10 years ago
Michael Peter Christen	f9ba50379d	added an expansion option to search facets on result page: - if less or equal of 8 facet options are present, they are shown by default - if more facet options are present, they are hidden To view or hide all facets, just click on the facet header bar	10 years ago
reger	de56d934b2	apply query parameter getQueryFields() to GSA servlet	10 years ago
reger	9b0de2de64	introduce getQueryFields to return default query fields (queryparamter QF) calculated from boostfields config, making sure title, description, keywords and content is always searched. - apply change to solrServlet makes sure every remote query uses at least all locally defined boost fields for search - apply to local solr search - simplify select query by using QF defaults	10 years ago
reger	23924348e2	url with semicolon or comma handling in proxy request apply patch supplied with bugreport http://mantis.tokeek.de/view.php?id=540	10 years ago
reger	9025fe3518	upd error message for proxy fix http://mantis.tokeek.de/view.php?id=539	10 years ago
Michael Peter Christen	b5ac29c9a5	added a html field scraper which reads text from html entities of a given css class and extends a given vocabulary with a term consisting with the text content of the html class tag. Additionally, the term is included into the semantic facet of the document. This allows the creation of faceted search to documents without the pre-creation of vocabularies; instead, the vocabulary is created on-the-fly, possibly for use in other crawls. If any of the term scraping for a specific vocabulary is successful on a document, this vocabulary is excluded for auto-annotation on the page. To use this feature, do the following: - create a vocabulary on /Vocabulary_p.html (if not existent) - in /CrawlStartExpert.html you will now see the vocabularies as column in a table. The second column provides text fields where you can name the class of html entities where the literal of the corresponding vocabulary shall be scraped out - when doing a search, you will see the content of the scraped fields in a navigation facet for the given vocabulary	10 years ago
Michael Peter Christen	bee5ee7cce	removed some warnings	10 years ago
Michael Peter Christen	4c9d2a7c64	reverted 'do not show all options' strategy. This is actually confusing new users. Will be activated maybe again if there is an optional tutorial mode which can be switched on for this special purpose of running a tutorial.	10 years ago
reger	4eb89d7f15	revert clickservlet (default was indeed a mistakenly)	10 years ago
Michael Peter Christen	c9e2128260	please commit new files under your own name, this file was not created by me.	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago
reger	6a04563578	Init Jetty using setDefaultDescriptor (web.xml) to defaults/web.xml so web.xml in defaults dir is applied first and optional DATA/SETTINGS/web.xml loaded on top. By using this Jetty feature (default web.xml) we assure that changes to the default are applied to existing installations and individual addition/changes are still respected.	10 years ago
reger	1f9389396a	fix NPE related 500 (Bad Request) response of UrlProxy on blacklisted urls, by adding parameter HTTPDeamon and removing unused hostAddress lookup code in sendRespondError	10 years ago
reger	f856edecb6	fix proxy redirect (http status 302) response fixes http://mantis.tokeek.de/view.php?id=517 The url given in bug report uses a gzip input stream which causes the HTTPClient.writeto() throw an IOException due to incomplete input stream. This in turn prevents the 302 reponse to the client browser. By limiting to serve target content just on httpstatus=200 will proxy the header reponse and client browsers redirect settings can be honored.	10 years ago
Michael Peter Christen	28683530cd	fixes to usage of no-cache: use and recognize also the no-store directive	10 years ago
Michael Peter Christen	c9c700b510	reduction of http requests to YaCy using the correct cache-control, expires and last-modified headers in http response.	10 years ago
Michael Peter Christen	1cfddea578	added (very experimental) Solr response writer for snapshot image results	10 years ago
Michael Peter Christen	3354cd63be	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	63846ddb89	add final SolrQueryRequest.close to SolrServlet	10 years ago
Michael Peter Christen	578ae29f1e	added a note that the servlet is linked using web.xml	10 years ago
reger	6c3f36def1	- fix path to default heuristic.cfg - deprecate unused ProxyServlet	10 years ago
reger	ff18129def	ViewFile servlet: update index if newer, so viewed text and metadata (stored) info is similar - to archive it, use request with profile to allow indexing (defaultglobaltext) and update index (the resource is loaded, parsed anyway, so it's not a expensive operation) Request: remove 2 unused init parameter - number of anchors of the parent - forkfactor sum of anchors of all ancestors	10 years ago
Michael Peter Christen	226aea5914	added a servlet which can create preview images, preview tumbnails and preview pdfs from web pages, i.e.: http://localhost:8090/api/snapshot.png?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.jpg?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.pdf?url=http://yacy.net/en/ This supports also an on-the-fly generation of the preview documents if the user is an administrator. Otherwise, the servlet fails. To enable this, you must add wkhtmltopdf, imagemagick and (on headless servers) xvfb to your operation system. for detailed instructions, see `97f6089a41`	10 years ago
Michael Peter Christen	97f6089a41	YaCy can now create web page snapshots as pdf documents which can later be transcoded into jpg for image previews. To create such pdfs you must do: Add wkhtmltopdf and imagemagick to your OS, which you can do: On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from http://wkhtmltopdf.org/downloads.html and downloadh ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip In Debian do "apt-get install wkhtmltopdf imagemagick" Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and "Always Fresh" - this is used by wkhtmltopdf to fetch web pages using the YaCy proxy. Using "Always Fresh" it is possible to get all pages from the proxy cache. Finally, you will see a new option when starting an expert web crawl. You can set a maximum depth for crawling which should cause a pdf generation. The resulting pdfs are then available in DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf	10 years ago

1 2 3 4 5

248 Commits (b4adbcbd35ce24aaa964393bdd5f3ba8a2b3cf25)