yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	d90b001e1b	Improved previous merge "Show ranking in HTML UI". - added the new setting as configurable in the "Debug/Analysis" settings page. Debug/analysis is its main purpose for now as there is currently no nice and "understansable" ranking score info servlet (see forum discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) - render in the "Search Page Layout" page preview when enabled - added constants	8 years ago
luccioman	efe1232d90	Merge branch 'html-show-ranking' of https://github.com/JeremyRand/yacy_search_server Conflicts: defaults/yacy.init	8 years ago
luccioman	09e72eb0a4	Set Config Portal as a private administration page. Consistently with its required action from submission credentials, and because external unauthenticated users do not need to access these settings.	8 years ago
reger	3dd23c178b	Introduce the option to configure a shutdown port. A port value of -1 will disable this option. If set to a value greater 0, YaCy listens on this of on the local loopback address (127.0.0.1) for a shutdown or restart signal. E.g. connect to http://localhost:8005/shutdown will stop the YaCy server. http://localhost:8005/restart will restart it. This option allows to stop YaCy locally independant from the web web frontend (which might be configured for password protected remote access).	8 years ago
luccioman	cdcd923375	Privacy enhancement : added settings to control referrer policy. HTTP "Referer" header sent by the browser when using YaCy can now be controlled either with the referrer meta tag as a global policy, or only for search result links by adding the attribute rel="noreferrer". To improve privacy with the less possible regressions, the default is set as meta tag with value "origin-when-cross-origin" : internal YaCy links behavior is not affected, but when visiting external websites referrer url is not empty but stripped from query parameters and path. Older browsers, Safari, MS IE and Edge do not support the referrer meta tag, so the standard but less flexible noreferrer link type can also be enabled as an alternative. User-friendly settings page to be implemented.	8 years ago
luccioman	1857651988	Added a new Debug/Analysis advanced settings subsection. As discussed in PR #93 with @JeremyRand and @reger24 this new advanced settings page includes: - a new setting to control remote Solr responses encoding - some existing debug settings which could not be set through the admin user interface	8 years ago
luccioman	826e5bbadd	Documented /HostBrowser.html related configuration settings	8 years ago
luccioman	aa9ddf3c23	Added control over Robots.txt active threads maximum number. When starting a crawl from a file containing thousands of links, configuration setting "crawler.MaxActiveThreads" is effective to prevent saturating the system with too many outgoing HTTP connections threads launched by the crawler. But robots.txt was not affected by this setting and was indefinitely increasing the number of concurrently loading threads until most ot the connections timed out. To improve performance control, added a pool of threads for Robots.txt, consistently used in its ensureExist() and massCrawlCheck() methods. The Robots.txt threads pool max size can now be configured in the /PerformanceQueus_p.html page, or with the new "robots.txt.MaxActiveThreads" setting, initialized with the same default value as the crawler.	8 years ago
reger	08a0acc35d	make a YearNavigator availabel, useable as SearchEvent.naviator plugin. It can take any Date field of the index and displays a list of year strings in reverse order by the year (not the score/count). To allow to define the index field to use, the fieldname (and title can be appended to the navi's name "year" e.g. year:load_date_dt:LoadDate It works also with dates_in_content_dts field (from the graphical date navigator). Here the query parameter from: to: are used on selection as Query modifier (for other dates currently no query parameter available, so selection won't work to filter search results). Not included in the UI Searchpage layout config so far (for experiment with it manual change to conf needed).	8 years ago
reger	bad8f87998	remove old/obsolete clear text "adminAccount" credential entry from init and setConfig (.,empty) from servlets/code	8 years ago
luccioman	7296e3884f	Switched even more URLs to pure relative ones. Thus a YaCy peer can run behind a reverse proxy subfolder without need for the reverse proxy to rewrite HTML links (a CPU costly operation). Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	84b81c1af0	Switched more URLs to relative ones when possible. This permits an easier and more flexible reverse proxy configuration. Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
reger	af39a76bf6	Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable	8 years ago
JeremyRand	4963ecb0a0	Add preference (disabled by default) to show the ranking for each result on the HTML UI.	8 years ago
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	8 years ago
Marc Nause	1f7013a1e3	removed unused properties in default config (CGI capabilities of YaCy's HTTPd have been removed many moons ago)	8 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	8410536f75	keep svnRevision in .init for convert of .conf until release >1.83	9 years ago
reger	726ebee65a	include Version config string in yacy.init (replacing svnRevision)	9 years ago
reger	16724c1283	remove unused proxyCookieWhiteList from yacy.init	9 years ago
Michael Peter Christen	5d635879f8	Merge pull request #40 from Scarfmonster/autocrawl Automatic crawling	9 years ago
Ryszard Goń	7d6e0d8470	Add missing settings to autocrawl settings page	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
reger	4765e374e6	altered clac. of search result items per page to display taking the existing limits into account but make it consistent with search option screen for admin and public user changes: - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit) - otherwise requests are limited to 100 results per page ( = search option, index.html) (this basically is the major change, inc. limit from 20 to 100 for public user) P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being) see http://mantis.tokeek.de/view.php?id=627	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
reger	a5faf73afa	remove obsolete yacy.init entries interaction.* (related to removed triplestore)	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
reger	f0b5bc93a3	remove obsolete yacy.init entry "secureHttps" not used anywhere	9 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	9 years ago
Michael Peter Christen	e1cd9c0dba	added another default network / commented out	9 years ago
Michael Peter Christen	9c12555be5	added link to Snapshots in search results if the snapshot exists and option is set in ConfigSearchPage_p (this is a stub: we also need a visualization of pdf files!)	10 years ago
Michael Peter Christen	535f1ebe3b	added a new way of content browsing in search results: - date navigation The date is taken from the CONTENT of the documents / web pages, NOT from a date submitted in the context of metadata (i.e. http header or html head form). This makes it possible to search for documents in the future, i.e. when documents contain event descriptions for future events. The date is written to an index field which is now enabled by default. All documents are scanned for contained date mentions. To visualize the dates for a specific search results, a histogram showing the number of documents for each day is displayed. To render these histograms the morris.js library is used. Morris.js requires also raphael.js which is now also integrated in YaCy. The histogram is now also displayed in the index browser by default. To select a specific range from a search result, the following modifiers had been introduced: from:<date> to:<date> These modifiers can be used separately (i.e. only 'from' or only 'to') to describe an open interval or combined to have a closed interval. Both dates are inclusive. To select a specific single date only, use the 'to:' - modifier. The histogram shows blue and green lines; the green lines denot weekend days (saturday and sunday). Clicking on bars in the histogram has the following reaction: 1st click: add a from:<date> modifier for the date of the bar 2nd click: add a to:<date> modifier for the date of the bar 3rd click: remove from and date modifier and set a on:<date> for the bar When the on:<date> modifier is used, the histogram shows an unlimited time period. This makes it possible to click again (4th click) which is then interpreted as a 1st click again (sets a from modifier). The display feature is NOT switched on by default; to switch it on use the /ConfigSearchPage_p.html servlet.	10 years ago
reger	ba276d3e64	add description_txt to default query fields, Dublin Core Metadata field extracted by most parsers.	10 years ago
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	10 years ago
Michael Peter Christen	ac19690d30	refactoring with CommonPattern.COMMA	10 years ago
Michael Peter Christen	cf9b22ca5c	do not reindex based on vocabulary fields (there are meanwhile many of them) and some default settings	10 years ago
reger	4eb89d7f15	revert clickservlet (default was indeed a mistakenly)	10 years ago
Michael Peter Christen	61ae9d2d11	do not use the clickservlet by default. From my personal view, this technique should not be used at all! This project is about privacy, the existence of a click servlet is one example why people should NOT use a search portal if such exists.	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago
reger	e177d69387	remove obsolete config footer option (ConfigPortal user.login) no footer or footer-option in use remove unused yacy.init item allowUnlimitedReceiveIndexFrom	10 years ago
Michael Peter Christen	eb78388a98	changed prefer strategy for http unique in such a way that http is preferred over https. While this is a bad idea from the standpoint of security it is more common applicable for environments where http and https mix and for some domains https is not available. Then the double-check is possible even if no postprocessing is performed.	10 years ago
Michael Peter Christen	d14114697c	the miss cache does not seem to work, it sometimes contains urlhashes from documents which actually are inside the index. This can be reproduced using the crawl result table at http://localhost:8090/CrawlResults.html?process=5 The cache is temporary disabled to remove the bad behaviour, however a later reactivation of that feater may be possible.	10 years ago
reger	446f374ba9	fix yacy.init comment http://mantis.tokeek.de/view.php?id=513	10 years ago
Michael Peter Christen	60f27bdf49	added the property timeoutrequests to configuration to disable TimeoutRequests. The purpose is to test if YaCy runs better on VMs where there is a limitation of concurrent processes; see /proc/user_beancounters in row numproc; this value is limited and should be low. Try to set timeoutrequests to keep this low. (works only after restart)	10 years ago
Michael Peter Christen	1d45d9405a	security bugfix	10 years ago
Michael Peter Christen	c0f9f6ac66	added option to change the navbar-default, i.e. usable for dark skins	10 years ago
Michael Peter Christen	84763126e0	added option to make the YaCy proxy act as the cache is never stale. If set to 'Always Fresh' the cache is always used if the entry in the cache exist. This is a good way to archive web content and access it without going online again in case the documents exist. To do so, open /Settings_p.html?page=ProxyAccess and check the "Always Fresh" checkbox. This is set do false which behave as set before. If you set this to true, then you have your web archive in DATA/HTCACHE. Copy this to carry around your private copy of the internet!	10 years ago
Michael Peter Christen	68e8039fd1	added high-precision scheduler for API processes. This allows also to make the execution in dependency of available RAM or CPU load. The default value for CPU load is 4.0 and the check runs once a minute.	10 years ago
Michael Peter Christen	26279b0993	added debug code for statistics about document attributes related to domains	10 years ago

1 2 3 4 5 ...

370 Commits (31ad043bb92e3bdf8e2ca5cfed1424b39c59c631)