yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	e6907fdab3	Added optional search parameter/setting to control content domain filter Thus allowing to choose at configuration or per search request, whether extending or not results beyond strict content domain filter (image, video, audio or application). Related graphical controls to be added to user interface.	7 years ago
luccioman	17e004599d	Started implementing optional https preference for protocol operations Introduced through the new configurable setting network.unit.protocol.https.preferred, defaulting to false for now. Let choose to prefer using https when available on remote peers to perform YaCy protocol operations including notably hello or transferRWI. Not yet implemented for every YaCy protocol operations.	7 years ago
luccioman	d95b288f19	Removed use of deprecated Jetty IPAccessHandler for client filtering. Upgraded to InetAccessHandler. Added InetPathAccessHandler extension to InetAccessHandler to maintain path patterns capability previously available in IPAccessHandler but lost in InetAccessHandler. Filtering on IPv6 addresses is now supported. Support for deprecated pattern formats such as "192.168." and "192.168.1.1/path" has been removed, but startup automated migration should convert such patterns eventually present in serverClient.	7 years ago
luccioman	f01aac31fd	Made possible to use https for remote search on peers with SSL enabled. Default is still http to prevent any regressions, but a new setting is available to choose https as the preferred protocol to perform remote searches. New configuration setting 'remotesearch.https.preferred' is manually editable in yacy.conf file or in Advanced Properties page (/ConfigProperties_p.html). Should be enabled as default in the future for improved privacy. Https could also eventually be used for other peers communications.	7 years ago
luccioman	bab5f0485f	Added signing key to developer releases location.	7 years ago
luccioman	af198b990b	Added an optional login link/status to the search public top nav bar. Thus allowing a more convenient way (wihout the need to go to the admin section) to login when searching on your remote or password protected peer and benefit from extended search features such as Heuristics, Bookmarking or JavasScript resorting. Can be disabled using the ConfigSearchPage_p.html.	7 years ago
luccioman	dbff7b14fc	Add a configurable limit to tags initially displayed in search results When the limit is reached, a button allow expanding/collapsing remaining tags. When this feature is activated without a limit to the number of displayed tags, when encountering search results with a very large number of keywords, the results page can become almost unusable (very long vertical scrollbar)	7 years ago
luccioman	ef8aea7f8d	Made the dates navigator max elements number user configurable. Also used object properties on QueryParams instances, rather than using mutable class (static) properties.	7 years ago
JeremyRand	d37df75afa	(WIP) Optionally sort HTML search items via Javascript. TODO: Expose a GUI setting for this.	7 years ago
reger	b6a41df4f7	Remove deprecated YaCyProxyServlet was replaced by UrlProxyServlet	7 years ago
reger	41616de0b8	Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning. This overrides Solr default to use managed schema. As we don't use programatic schema changes this directs Solr to use schema.xml, eliminating the warning.	7 years ago
reger	9220ccbec7	remove reference to velocityresponsewriter in solrconfig.xml it is not longer part of solr-core api http://lucene.apache.org/solr/6_6_0/index.html	8 years ago
reger	4be4bfbba6	remove sample path setting in solrconfig.xml not valid in Yacy resulting in startup stop exception after fresh swithch to 1.921	8 years ago
luccioman	f6e8d71718	Prevent high CPU load at startup, caused by the Solr suggester build. Reported by Collision on mantis 758 ( http://mantis.tokeek.de/view.php?id=758 ). Introduced by the new YaCy Solr configuration for Solr 6.6.0 (see commit `6fe735945d`), including now Suggester configuration.	8 years ago
luccioman	28b451a0b3	Made Cache compression level and lock timeout user configurable	8 years ago
luccioman	73ab4a7b3a	Prevent log pollution from unwanted Solr warnings. Many non-blocking "java.nio.file.NoSuchFileException" traces with warning log level can be logged by Solr, especially when heavily crawling. This is issue is known from Solr 5.x but still unresolved with Solr 6.x ( https://issues.apache.org/jira/browse/SOLR-9120 ) Consequently upgraded to "SEVERE" the default log level of the related internal Solr class. See also mantis 727 ( http://mantis.tokeek.de/view.php?id=727 )	8 years ago
Michael Peter Christen	6fe735945d	migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8 Also: now Version 1.921	8 years ago
reger	a814f3d885	Introduce keyword query parameter This enables keyword navigator to filter on keywords. Added search page output and layout config for keywords, allowing e.g. in Intranet use to display the keywords. No styling or links applied to the keyword text (but is desirable possibly in combination with bootstrap-tagsinput for future/intranet).	8 years ago
luccioman	d90b001e1b	Improved previous merge "Show ranking in HTML UI". - added the new setting as configurable in the "Debug/Analysis" settings page. Debug/analysis is its main purpose for now as there is currently no nice and "understansable" ranking score info servlet (see forum discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) - render in the "Search Page Layout" page preview when enabled - added constants	8 years ago
luccioman	efe1232d90	Merge branch 'html-show-ranking' of https://github.com/JeremyRand/yacy_search_server Conflicts: defaults/yacy.init	8 years ago
luccioman	09e72eb0a4	Set Config Portal as a private administration page. Consistently with its required action from submission credentials, and because external unauthenticated users do not need to access these settings.	8 years ago
reger	1ccc44e681	fix default/httpd.mime Z file extension to lower case + test case	8 years ago
reger	44a9a580e3	remove seedlist bootstrap target (not working for some longer time)	8 years ago
reger	3dd23c178b	Introduce the option to configure a shutdown port. A port value of -1 will disable this option. If set to a value greater 0, YaCy listens on this of on the local loopback address (127.0.0.1) for a shutdown or restart signal. E.g. connect to http://localhost:8005/shutdown will stop the YaCy server. http://localhost:8005/restart will restart it. This option allows to stop YaCy locally independant from the web web frontend (which might be configured for password protected remote access).	8 years ago
reger	f7fce1baad	make digest default authentication in defaults/web.xml	8 years ago
luccioman	9d9f86dcdd	Updated Archive-It heuristics URL. The archive-it OpenSearch URL requested without restriction on collections ("i" parameter) almost always ends up with timeout or fails.	8 years ago
luccioman	cdcd923375	Privacy enhancement : added settings to control referrer policy. HTTP "Referer" header sent by the browser when using YaCy can now be controlled either with the referrer meta tag as a global policy, or only for search result links by adding the attribute rel="noreferrer". To improve privacy with the less possible regressions, the default is set as meta tag with value "origin-when-cross-origin" : internal YaCy links behavior is not affected, but when visiting external websites referrer url is not empty but stripped from query parameters and path. Older browsers, Safari, MS IE and Edge do not support the referrer meta tag, so the standard but less flexible noreferrer link type can also be enabled as an alternative. User-friendly settings page to be implemented.	8 years ago
luccioman	13c5c09518	Fixed datacite.org heuristics base url. The datacite Solr search http URL was returning http status 301 in order to redirect to its https version, thus making that YaCy heuristic always fail.	8 years ago
luccioman	ac766327d3	Switched a few more Solr fields from strictly mandatory to optional	8 years ago
luccioman	cdc7f3e431	Switched some Solr fields from mandatory to optional These fields are default enabled but with no doubt not strictly mandatory with the current code base. As reported by @reger24, splitting between essential mandatory and optional fields is still to be improved to reflect the current YaCy needs.	8 years ago
luccioman	c68a8be2d9	Refactored and enforced Solr mandatory fields for proper operation - Added a new method to check activation of mandatory fields on Collection Configuration commit, consistently with checks previously performed in Switchboard startup and with mandatory fields in the default schema. - Reorganized default schema and CollectionConfiguration enumeration : moved no more mandatory fields in a specific section, and moved fields enabled at startup to the mandatory section. - Marked mandatory fields as required and with stronger font in the IndexSchema_p.html page	8 years ago
reger	6ec6ab55ba	removed faroo news from default opensearch config As @luccioman informed, it's only useable with a free api key http://www.faroo.com/hp/api/api.html http://blog.faroo.com/2013/06/30/faroo-introduces-an-api-key/	8 years ago
reger	f85aaa7c76	update opensearch conf - remove suche.sueddeutsche.de apparently they've revoked the participation in opensearch initiative.	8 years ago
luccioman	bf16de29c1	Added support for HTML OpenSearch results. Many OpenSearch systems do not provide results as standard RSS/Atom feeds but only as HTML. This modification add some support for custom OpenSearch HTML results through the use of mapping files (as already done for federated Solr search) relying on CSS-like selectors to retrieve information from HTML content. An example mapping file is provided to map results from the www.npmjs.com OpenSearch URL.	8 years ago
luccioman	1857651988	Added a new Debug/Analysis advanced settings subsection. As discussed in PR #93 with @JeremyRand and @reger24 this new advanced settings page includes: - a new setting to control remote Solr responses encoding - some existing debug settings which could not be set through the admin user interface	8 years ago
luccioman	826e5bbadd	Documented /HostBrowser.html related configuration settings	8 years ago
Michael Peter Christen	dbd34befc0	added luccioman development release builds as discussed in http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5906	8 years ago
Michael Peter Christen	204e507b2b	updated seed-list bootstrap locations	8 years ago
luccioman	aa9ddf3c23	Added control over Robots.txt active threads maximum number. When starting a crawl from a file containing thousands of links, configuration setting "crawler.MaxActiveThreads" is effective to prevent saturating the system with too many outgoing HTTP connections threads launched by the crawler. But robots.txt was not affected by this setting and was indefinitely increasing the number of concurrently loading threads until most ot the connections timed out. To improve performance control, added a pool of threads for Robots.txt, consistently used in its ensureExist() and massCrawlCheck() methods. The Robots.txt threads pool max size can now be configured in the /PerformanceQueus_p.html page, or with the new "robots.txt.MaxActiveThreads" setting, initialized with the same default value as the crawler.	8 years ago
reger	08a0acc35d	make a YearNavigator availabel, useable as SearchEvent.naviator plugin. It can take any Date field of the index and displays a list of year strings in reverse order by the year (not the score/count). To allow to define the index field to use, the fieldname (and title can be appended to the navi's name "year" e.g. year:load_date_dt:LoadDate It works also with dates_in_content_dts field (from the graphical date navigator). Here the query parameter from: to: are used on selection as Query modifier (for other dates currently no query parameter available, so selection won't work to filter search results). Not included in the UI Searchpage layout config so far (for experiment with it manual change to conf needed).	8 years ago
reger	bad8f87998	remove old/obsolete clear text "adminAccount" credential entry from init and setConfig (.,empty) from servlets/code	8 years ago
luccioman	7296e3884f	Switched even more URLs to pure relative ones. Thus a YaCy peer can run behind a reverse proxy subfolder without need for the reverse proxy to rewrite HTML links (a CPU costly operation). Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	84b81c1af0	Switched more URLs to relative ones when possible. This permits an easier and more flexible reverse proxy configuration. Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
reger	af39a76bf6	Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable	8 years ago
luccioman	6e1959f469	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Conflicts: htroot/yacysearchitem.java source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java source/net/yacy/search/schema/CollectionConfiguration.java source/net/yacy/server/serverObjects.java	8 years ago
JeremyRand	4963ecb0a0	Add preference (disabled by default) to show the ranking for each result on the HTML UI.	8 years ago
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	8 years ago
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	8 years ago
Marc Nause	1f7013a1e3	removed unused properties in default config (CGI capabilities of YaCy's HTTPd have been removed many moons ago)	8 years ago
luccioman	893a40995a	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	634e48309b	another peer list update	8 years ago
Michael Peter Christen	16420e5507	added another principal peer	8 years ago
luccioman	6e96c7341a	Merge remote-tracking branch 'origin/master' Conflicts: htroot/Load_MediawikiWiki.java htroot/Load_PHPBB3.java htroot/ViewImage.java	8 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	8410536f75	keep svnRevision in .init for convert of .conf until release >1.83	9 years ago
reger	726ebee65a	include Version config string in yacy.init (replacing svnRevision)	9 years ago
Michael Peter Christen	f4591b1b51	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	1ce38fdaed	0n - added experimental zeronet network which supports intranet peers (still needs work)	9 years ago
Michael Peter Christen	d05ffa1c51	update to seed list	9 years ago
reger	16724c1283	remove unused proxyCookieWhiteList from yacy.init	9 years ago
luc	3cc5619d93	Improved HTML icons indexing and rendering in search results. See http://mantis.tokeek.de/view.php?id=629	9 years ago
Michael Peter Christen	5d635879f8	Merge pull request #40 from Scarfmonster/autocrawl Automatic crawling	9 years ago
Ryszard Goń	7d6e0d8470	Add missing settings to autocrawl settings page	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
reger	4765e374e6	altered clac. of search result items per page to display taking the existing limits into account but make it consistent with search option screen for admin and public user changes: - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit) - otherwise requests are limited to 100 results per page ( = search option, index.html) (this basically is the major change, inc. limit from 20 to 100 for public user) P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being) see http://mantis.tokeek.de/view.php?id=627	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
reger	e8256bb3b1	remove blekko from opensearch config (not available) see https://blekko.com/ http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633	9 years ago
reger	a5faf73afa	remove obsolete yacy.init entries interaction.* (related to removed triplestore)	9 years ago
sixcooler	dce1cb65c4	Merge remote-tracking branch 'choose_remote_name/master'	9 years ago
reger	e84d94f8ca	fix mime table for ms office / open office documents (causing wrong parser detect in intranet mode)	9 years ago
reger	15e46b2bad	exclude in/outboundlinksnofollowcount_i from default schema fields (not used in any function)	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
luc	55a4d15775	Added a note on deprecated default search field and operator.	9 years ago
reger	b2c8bc0ae6	remove md5_s from default index fields it is not assigned a value / not used Due to above also excluded from transfer protocol.	9 years ago
sixcooler	f5a9948860	do not store subfield *_coordinate	9 years ago
sixcooler	fca353e5eb	set startuptype of most solr handlers to lazy	9 years ago
reger	c720b4c249	remove override of dynamicField coordinate_p in solr schema (coordinate_p is not a mandatory field as such doesn't need to be declared as schema.field)	9 years ago
reger	f0b5bc93a3	remove obsolete yacy.init entry "secureHttps" not used anywhere	9 years ago
reger	5e45f1a460	enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field	9 years ago
sixcooler	87e4abe393	fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has moved and was not cleared anymore. This results in an huge fieldcache. (http://lucene.apache.org/#highlights-of-the-lucene-release-include https://issues.apache.org/jira/browse/LUCENE-5666) Here I try to use DovValues where it is possible. For this I used the Api-Scheme as new basis für the Solr-Schema. This needs at least a complete optimization of the Solr-Index to get a smaller FieldCache. Everything that is indexed with these setting will not use the Fieldcache at all.	9 years ago
reger	250f6457f0	remove exired domain titan.deep-one.in from bootstrap.seedlist	9 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	9 years ago
Michael Peter Christen	e1cd9c0dba	added another default network / commented out	9 years ago
reger	00d2062813	Rem depreciated AdminHandlers in solrconfig.xml avoid warning log W org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/" class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore	10 years ago
Michael Peter Christen	694b22f165	migration to Solr 5.2: huge benefits - this is a lot faster! This is a very complex migration: many classes had been renamed or removed, dependencies changed and the solr index type is now aligned to be a solr cloud repository. Together with the Solr 5.2 library update, one other dependent library had been updated as well: httpclient 4.4->4.4.1 Older indexes are migrated from 4_10 to 5_2. However, the new index structure is more efficient and we recommend to re-index everything. Please use the index export before you do the update to a large surrogate xml file. After the update, start with an empty index and then initialize this with your dump.	10 years ago
Michael Peter Christen	9c12555be5	added link to Snapshots in search results if the snapshot exists and option is set in ConfigSearchPage_p (this is a stub: we also need a visualization of pdf files!)	10 years ago
reger	6bc8a9b11e	make Quality of Service Servlet available to prioritize requests from local host This assigns priorities to incoming requests. Higher priority numbers are served before lower. (disabled by default in defaults/web.xml, uncomment or copy entry to DATA/Settings/web.xml)	10 years ago
Michael Peter Christen	b060ba900d	added parsing of contentprop attribute in html tags for content='startDate' and content='endDate'. The value of these field is now written to new solr fields startDates_dts and endDates_dts.	10 years ago
Michael Peter Christen	4cb4f67f38	added parsing of dd, dt and article html fields. The parsed result is written to special solr fields which are deactivated by default.	10 years ago
Michael Peter Christen	36e9cdb376	testing switching off cold searchers; maybe this brings performance enhancements when using large facets	10 years ago
Michael Peter Christen	535f1ebe3b	added a new way of content browsing in search results: - date navigation The date is taken from the CONTENT of the documents / web pages, NOT from a date submitted in the context of metadata (i.e. http header or html head form). This makes it possible to search for documents in the future, i.e. when documents contain event descriptions for future events. The date is written to an index field which is now enabled by default. All documents are scanned for contained date mentions. To visualize the dates for a specific search results, a histogram showing the number of documents for each day is displayed. To render these histograms the morris.js library is used. Morris.js requires also raphael.js which is now also integrated in YaCy. The histogram is now also displayed in the index browser by default. To select a specific range from a search result, the following modifiers had been introduced: from:<date> to:<date> These modifiers can be used separately (i.e. only 'from' or only 'to') to describe an open interval or combined to have a closed interval. Both dates are inclusive. To select a specific single date only, use the 'to:' - modifier. The histogram shows blue and green lines; the green lines denot weekend days (saturday and sunday). Clicking on bars in the histogram has the following reaction: 1st click: add a from:<date> modifier for the date of the bar 2nd click: add a to:<date> modifier for the date of the bar 3rd click: remove from and date modifier and set a on:<date> for the bar When the on:<date> modifier is used, the histogram shows an unlimited time period. This makes it possible to click again (4th click) which is then interpreted as a 1st click again (sets a from modifier). The display feature is NOT switched on by default; to switch it on use the /ConfigSearchPage_p.html servlet.	10 years ago
reger	ba276d3e64	add description_txt to default query fields, Dublin Core Metadata field extracted by most parsers.	10 years ago
reger	fe6f5a395d	fix Umlaut handling in blekko heuristic search term http://mantis.tokeek.de/view.php?id=169 observation: blekko seams to block xxxbot agents (=0 results)	10 years ago
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	10 years ago
Michael Peter Christen	ac19690d30	refactoring with CommonPattern.COMMA	10 years ago
Michael Peter Christen	cf9b22ca5c	do not reindex based on vocabulary fields (there are meanwhile many of them) and some default settings	10 years ago
reger	24f68a4eb7	refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. default heuristicopensearch.conf: - openbdb.com removed - seems not longer to deliver results - config via solrconnector to datacite.org added (large technical library archive)	10 years ago
reger	4eb89d7f15	revert clickservlet (default was indeed a mistakenly)	10 years ago
Michael Peter Christen	61ae9d2d11	do not use the clickservlet by default. From my personal view, this technique should not be used at all! This project is about privacy, the existence of a click servlet is one example why people should NOT use a search portal if such exists.	10 years ago

1 2 3 4 5 ...

696 Commits (6d388bb7bfa79454ef63a1e227924fdab105b807)