yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Lukas Fülling	e8a00007f6	add setting for public facing port	3 years ago
Michael Peter Christen	96592a10cf	added option to set yacy configuration values using environment variables To use that feature, set an environment variable with prefix "yacy." and suffix identical to the yacy configuration attribute name. Additionaly we implemented a way to set a peer name using the setting "network.unit.agent". This can therefore now be used to set a peer name with the java call parameter -Dyacy.network.unit.agent=anonymous The purpose for this feature is the ability to set peer names in mass-deployed kubernetes clusters to the same name to prevent that we are flooding peer name statistics with auto-deployment-generated names.	4 years ago
luccioman	a5771b1f14	Made SNI extension user configurable without the need for server restart TLS Server Name Indication (SNI) extension activation can now be configured with the new Settings_p.html?page=httpClient administration page. SNI extension is also now enabled by default, as in 2019 the unrecognized_name(112) alert is more properly handled by major web servers TLS implementations, following the RFC 6066 standard. Related YaCy issues : #153 #189 and #272 JDK 1.7 bug : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374 Apache httpd issue : https://bz.apache.org/bugzilla/show_bug.cgi?id=56241 RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3	6 years ago
luccioman	08ea0b0397	Added a configurable timeout to wkhtmltopdf calls for pdf snapshots Necessary to prevent blocking the indexing workflow when some wkhtmltopdf renderings fail without terminating	6 years ago
luccioman	b5dc1f376f	Made outgoing pools max total connections user configurable For a finer control over the maximum simultaneously active outgoing connections.	7 years ago
luccioman	387d646c0e	Added gzip compression of responses returned to user-agents accepting it Enabled as default, but can be disabled using the "Server Access Settings" admin page.	7 years ago
luccioman	35826a3091	Added a search page customization setting to display or not favicons If not interested in displaying this on your search results and notably on a peer with limited resources this can help saving some CPU and outgoing network connections.	7 years ago
luccioman	a3ec7a7a5f	Added analysis optional setting to compute statistics on text snippets Thus producing some basic stats on processing times for snippets generation and counts on snippets per source type.	7 years ago
luccioman	d92b191942	Ensure no remote Solr is attached before "Shut Down and Re-Start Solr" Otherwise once this operation is applied, the remote Solr(s) instances are deconnected and the embedded Solr is connected even if disabled by setting "core.service.fulltext". Also use constants for related default setting values.	7 years ago
luccioman	69690c13a0	Optionally allow external Solr server with self-signed certificate This is necessary when you want to attach to a dedicated external Solr server protected with basic http authentication and requested over https but having only a self-signed certificate.	7 years ago
luccioman	2fd4d05e2f	Added a shared Java constant for setting key server.servlets.called	7 years ago
luccioman	9412881230	Added basic support for autotagging microdata annotated item types. With the appropriate vocabulary settings in Vocabulary_p.html page, this can produce Vocabulary search facets displaying item types referenced in html documents by microdata annotation. Tested notably, but not limited to, vocabulary classes/types defined by Schema.org and Dublin Core.	7 years ago
luccioman	e6907fdab3	Added optional search parameter/setting to control content domain filter Thus allowing to choose at configuration or per search request, whether extending or not results beyond strict content domain filter (image, video, audio or application). Related graphical controls to be added to user interface.	7 years ago
luccioman	17e004599d	Started implementing optional https preference for protocol operations Introduced through the new configurable setting network.unit.protocol.https.preferred, defaulting to false for now. Let choose to prefer using https when available on remote peers to perform YaCy protocol operations including notably hello or transferRWI. Not yet implemented for every YaCy protocol operations.	7 years ago
luccioman	f01aac31fd	Made possible to use https for remote search on peers with SSL enabled. Default is still http to prevent any regressions, but a new setting is available to choose https as the preferred protocol to perform remote searches. New configuration setting 'remotesearch.https.preferred' is manually editable in yacy.conf file or in Advanced Properties page (/ConfigProperties_p.html). Should be enabled as default in the future for improved privacy. Https could also eventually be used for other peers communications.	7 years ago
luccioman	af198b990b	Added an optional login link/status to the search public top nav bar. Thus allowing a more convenient way (wihout the need to go to the admin section) to login when searching on your remote or password protected peer and benefit from extended search features such as Heuristics, Bookmarking or JavasScript resorting. Can be disabled using the ConfigSearchPage_p.html.	7 years ago
luccioman	dbff7b14fc	Add a configurable limit to tags initially displayed in search results When the limit is reached, a button allow expanding/collapsing remaining tags. When this feature is activated without a limit to the number of displayed tags, when encountering search results with a very large number of keywords, the results page can become almost unusable (very long vertical scrollbar)	7 years ago
luccioman	ef8aea7f8d	Made the dates navigator max elements number user configurable. Also used object properties on QueryParams instances, rather than using mutable class (static) properties.	7 years ago
luccioman	9e86d183b8	Disable manual search results resorting when resorting is done with JS Also added a constant for the js resorting setting key.	8 years ago
luccioman	4eba88f2ff	Removed some unnecessary uses of java.lang.reflect api. This improves code browsing and readability, making search by references or call hierarchy IDE features more accurate.	8 years ago
luccioman	dcc56318bb	Made remote search max system load limits configurable from UI. As reported by davide on YaCy forums ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the system is on high load, unless reading carefully YaCy configuration file, it could be difficult to understand why remote search results are not fetched.	8 years ago
luccioman	28b451a0b3	Made Cache compression level and lock timeout user configurable	8 years ago
luccioman	d90b001e1b	Improved previous merge "Show ranking in HTML UI". - added the new setting as configurable in the "Debug/Analysis" settings page. Debug/analysis is its main purpose for now as there is currently no nice and "understansable" ranking score info servlet (see forum discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) - render in the "Search Page Layout" page preview when enabled - added constants	8 years ago
reger	3dd23c178b	Introduce the option to configure a shutdown port. A port value of -1 will disable this option. If set to a value greater 0, YaCy listens on this of on the local loopback address (127.0.0.1) for a shutdown or restart signal. E.g. connect to http://localhost:8005/shutdown will stop the YaCy server. http://localhost:8005/restart will restart it. This option allows to stop YaCy locally independant from the web web frontend (which might be configured for password protected remote access).	8 years ago
reger	a2afb4bae0	add switchboardconstants for server ports config keys	8 years ago
luccioman	0173b0bc32	Added an advanced settings page for referrer policy settings. Feedback will be welcome, notably on the descriptive content of this page.	8 years ago
luccioman	cdcd923375	Privacy enhancement : added settings to control referrer policy. HTTP "Referer" header sent by the browser when using YaCy can now be controlled either with the referrer meta tag as a global policy, or only for search result links by adding the attribute rel="noreferrer". To improve privacy with the less possible regressions, the default is set as meta tag with value "origin-when-cross-origin" : internal YaCy links behavior is not affected, but when visiting external websites referrer url is not empty but stripped from query parameters and path. Older browsers, Safari, MS IE and Edge do not support the referrer meta tag, so the standard but less flexible noreferrer link type can also be enabled as an alternative. User-friendly settings page to be implemented.	8 years ago
luccioman	1857651988	Added a new Debug/Analysis advanced settings subsection. As discussed in PR #93 with @JeremyRand and @reger24 this new advanced settings page includes: - a new setting to control remote Solr responses encoding - some existing debug settings which could not be set through the admin user interface	8 years ago
luccioman	68afe900d0	Added user-friendly controls over disk usage configuration settings. As mentioned in issue #103, control settings over YaCy disk usage already existed but lacked a user-friendly way to set them. I added it to the Performance_p.html administration page with a little refactoring on the "Resource Observer" fieldset for improved accessibility and HTML standards respect. Also added the possibility to enable/disable the autoregulation fonction from this page.	8 years ago
luccioman	6a4d51d8f9	Cleaned up some Javadoc warnings.	8 years ago
luccioman	86dc198698	Fixed some JavaDocs broken links.	8 years ago
luccioman	aa9ddf3c23	Added control over Robots.txt active threads maximum number. When starting a crawl from a file containing thousands of links, configuration setting "crawler.MaxActiveThreads" is effective to prevent saturating the system with too many outgoing HTTP connections threads launched by the crawler. But robots.txt was not affected by this setting and was indefinitely increasing the number of concurrently loading threads until most ot the connections timed out. To improve performance control, added a pool of threads for Robots.txt, consistently used in its ensureExist() and massCrawlCheck() methods. The Robots.txt threads pool max size can now be configured in the /PerformanceQueus_p.html page, or with the new "robots.txt.MaxActiveThreads" setting, initialized with the same default value as the crawler.	8 years ago
reger	811cf637f8	fix Jetty9YaCySecurityHandler, length check of Basic credential, add comment to SwitchboardConstants.AdminAccount const	8 years ago
reger	af39a76bf6	Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable	8 years ago
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	8 years ago
reger	f10ea3c155	clean-out unused SwitchboardConstants	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
reger	a58d34a4e8	check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant	9 years ago
reger	cd26717ba2	fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	10 years ago
reger	3e742d1e34	Init remote crawler on demand If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread. Deploy of the remoteCrawlJob deferred on activation of the option.	10 years ago
Michael Peter Christen	6f4fe4b175	revert of `8a7c68e4c7` keeping surrogates after processing is essential for some users. If the space they are taking is too high, please set up an automatic deletion process (like a cronjob).	10 years ago
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	10 years ago
reger	42b0672be3	Let auto-disabled crawls recover if low resource condition vanished. Analog to autodisabled DHT switch autodisabled crawls back on upon mem ok by remembering the autodisable by conf parameter.	10 years ago
reger	4eb89d7f15	revert clickservlet (default was indeed a mistakenly)	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago

1 2 3

107 Commits (5acd98f4da615428eb2c03f34fab25d9484516f5)