yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Ian Smirlis	53518a91ab	In case of reload404, load only failed documents	3 years ago
Michael Peter Christen	e6a87e0426	enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader.	3 years ago
Michael Peter Christen	9182b3dfca	enhanced default value	3 years ago
Michael Peter Christen	3959d43a5c	fixed doku link	3 years ago
Michael Peter Christen	15b7461bc7	removed Xms java memory startup parameter We will use the default value for now on. This is much better for resource economy and fits better into a container/docker/kubernetes strategy. Furthermore, a small memory footprint is essential for the usage on small devices like RaspberryPi.	3 years ago
Michael Peter Christen	4377bd2b70	fix for wrong crawlName construction	3 years ago
Michael Peter Christen	e81b770f79	enabled crawl starts with very large sets of start urls i.e. 10MB large url list with approx 0.5 million start points	3 years ago
Michael Peter Christen	dbd211a1ad	removed/replaced reflection in memory tool	4 years ago
Michael Peter Christen	1cdb21592b	added hazelcast and some modifications to align legacy YaCy with YaCyGrid	4 years ago
sgaebel	7fecd859e5	fixes showing metadata from Searchresult, by removing defType=edismax also removes defType=edismax from IndexBrowser, but still does not show dates	4 years ago
sgaebel	f16cd154f7	removes unused imports and variables	4 years ago
sgaebel	c69c462a15	replaces a expensive getLoadTimeURL() by exists() refactors urlExists to getHarvestProcess as that is what it does	4 years ago
sgaebel	26223dc25a	replaces getLoadTime() by exists() with a simpler query since solr-8.8.1 getLoadTime() causes a high cpu usage	4 years ago
Michael Peter Christen	b46513f4a1	added stub of rc3assembly style a little bit late but whatever	4 years ago
Michael Peter Christen	3da7628117	use environment variables to overwrite configuration variables you can i.e. do: export YACY_PORT=8092 && ./startYACY.sh Just append "YACY_" to uppercase version of environment variables and replace all "." with "_".	4 years ago
Michael Peter Christen	13a2e6dc6e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	0ae8ccf657	Make it possible to set an empty password disabling the authentication protocol completely If you set now an empty password, then the http server will not ask to authentify. This is required for environment where we attach an outside authentification service like keycloak or similar using authentication in an ingress proxy. This change is part of the approach to run YaCy inside of a kubernetes cluster where we do not want individual authentication of peers and want to apply a ingress authentication.	4 years ago
Michael Peter Christen	96592a10cf	added option to set yacy configuration values using environment variables To use that feature, set an environment variable with prefix "yacy." and suffix identical to the yacy configuration attribute name. Additionaly we implemented a way to set a peer name using the setting "network.unit.agent". This can therefore now be used to set a peer name with the java call parameter -Dyacy.network.unit.agent=anonymous The purpose for this feature is the ability to set peer names in mass-deployed kubernetes clusters to the same name to prevent that we are flooding peer name statistics with auto-deployment-generated names.	4 years ago
Michael Peter Christen	198826c362	added network scanner process to discover all YaCy peers in the intranet this will be used to wire YaCy peers in a kubernetes cluster	4 years ago
Michael Peter Christen	d9602e8325	Implemented a new syntax in the template engine to simplify json APIs Added also an example for one of the existing APIs. The problem is the comma separator between objects which must not be there for the last entry in a sequence. The new syntax adds the separator symbol automatically.	4 years ago
Michael Peter Christen	5a7f12a9c1	allow network scans for non-standard http/https ports	4 years ago
Michael Peter Christen	022fb15670	fix for https://github.com/yacy/yacy_search_server/issues/385	4 years ago
Michael Peter Christen	17672fcbb4	adding hint how to shrink the disk size after an index deletion. implements https://github.com/yacy/yacy_search_server/issues/360	4 years ago
Michael Peter Christen	907f121d0c	do not overwrite PW with random PW	4 years ago
Michael Peter Christen	256fa3d985	new limitation documentation just replaced two by four	4 years ago
Michael Peter Christen	7997836506	fixed lock image	4 years ago
Michael Peter Christen	d0abb0cedb	enabling all crawl profiles in all network modes also: increased default internet crawl speed to 4 urls/s/host	4 years ago
Michael Peter Christen	a9befbba5f	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	fed8bd6325	automatically refresh css cache when switching skin and setting of default skin to current skin in selector	4 years ago
Michael Peter Christen	9a5694261a	design update more space	4 years ago
Michael Peter Christen	4ec55289a8	using a lock symbol which looks also good in dark designs	4 years ago
Michael Peter Christen	43a9f4f574	updated solr 6.6.6 -> 7.7.3 dropped GSA support (GSA API is still in YaCy Grid) The 6.6.6 solr index works without migration also with 7.7.3	4 years ago
Michael Peter Christen	c0d9a3e9a7	turned HostBrowser into a admin-only page, now called IndexBrowser This was required because spiders and bots crawled through this page and created load on the peer without use for the user or the YaCy network.	4 years ago
Michael Peter Christen	d359d521a1	fixed warc importer The importer tried to import a gziped files as plain warc. It will now check the file extension and use a unzip automatically on-the-fly.	4 years ago
Michael Peter Christen	cef5fde343	adding message to UI to make port change transparent	4 years ago
Michael Peter Christen	22841ffbf1	creating a threaddump during every cleanup process to be able to find out what a peer did (not) last time before a crash	4 years ago
Michael Peter Christen	d7b2d82faa	showing MB instead of KB in PerformanceMemory	4 years ago
sgaebel	3431f91db9	removes unused 'unused' tokens	4 years ago
sgaebel	dd9d4b1188	replace org.junit.Assert.assertThat by org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid deprecation-warning	4 years ago
sgaebel	df9ea0a42a	removes some warnings: unused imports, params	4 years ago
sgaebel	80785b785e	adds deleting during recrawl	4 years ago
Michael Peter Christen	e0ad8ca9da	replaced json library from JSON.org with libandroid-json-java This fixes https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	6d7dc01670	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	5 years ago
Michael Peter Christen	0a7bda2a21	removed JSON-evil license line These classes had been my own creative work. Just the copyright line had been appeared possibly due to a bad copy-paste activity, unaware that the line is a non-free addition.	5 years ago
Michael Christen	57484eb1cc	xss protection	5 years ago
Michael Peter Christen	37827b6788	removed doubes from getpageinfo	5 years ago
Michael Peter Christen	f03e16d3df	enhanced crawl start url check experience urls are now urlencoded and a check is also performed in case that an url is copied into the url field using copy-paste	5 years ago
Michael Christen	41f9b8517f	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	5 years ago
Michael Christen	4ccd1ea3c0	new servlet path "p2p" with a test class. Call the class with http://localhost:8090/p2p/seeds.json	5 years ago
Michael Peter Christen	f7c97fd99e	scanner crawl starts wants non-parseable files	5 years ago
Michael Peter Christen	a20b61f5c0	fix for bad json	5 years ago
Michael Peter Christen	d62a8ec542	masking connects	5 years ago
Michael Peter Christen	5eb0033aef	typo	5 years ago
Michael Peter Christen	2c0742fc43	added json version of peer list	5 years ago
Michael Christen	cfa27d2fd5	fixed links	5 years ago
Michael Peter Christen	0bddf2d895	switched url and snippet position	5 years ago
Michael Peter Christen	2999f4b985	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	5 years ago
Michael Peter Christen	449780f762	enhanced search result design	5 years ago
Michael Christen	cdc7adedc2	added sponsor link	5 years ago
Michael Christen	f2d45ebb87	design updates + added link to new forum	5 years ago
Michael Peter Christen	789670bd8c	design changes - more space	5 years ago
Michael Christen	3a46b07603	fixed many links to old forum, now https://searchlab.eu	5 years ago
luccioman	6b45cd5799	New optional crawl filter on the URL a doc must match to crawl its links For finer control over which parsed documents can trigger an addition of their links to the crawl stack, complementary to the existing crawl depth parameter.	6 years ago
luccioman	d16bc99835	Added "Show Metadata" links to the ViewFile.html links mode To conveniently follow parsed links in the file viewer	6 years ago
luccioman	8c068a9c99	Better HTML text semantics for technical descriptions	6 years ago
luccioman	a5771b1f14	Made SNI extension user configurable without the need for server restart TLS Server Name Indication (SNI) extension activation can now be configured with the new Settings_p.html?page=httpClient administration page. SNI extension is also now enabled by default, as in 2019 the unrecognized_name(112) alert is more properly handled by major web servers TLS implementations, following the RFC 6066 standard. Related YaCy issues : #153 #189 and #272 JDK 1.7 bug : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374 Apache httpd issue : https://bz.apache.org/bugzilla/show_bug.cgi?id=56241 RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3	6 years ago
luccioman	42c8a251c8	Render a relevant message and status on blocked search requests When unauthenticated (or with insufficient rights) client is blocked either because blacklisted or excessive request rate, render an error message and a relevant HTTP status for API requests, instead of an empty response that appears broken.	6 years ago
luccioman	a8316c79da	Allow JS resorting of search results by unauthenticated users Acces rate limitations to this search mode by unauthenticated users are set low by default to prevent unwanted server overload but can be customized through the SearchAccessRate_p.html configuration page Fixes #291	6 years ago
luccioman	0ab2b49c31	Made /yacysearch access rate limitations user configurable With a new admin page at /SearchAccessRate_p.html in menu Network Access > Local Search > Access Rate Limitations	6 years ago
luccioman	630fa0015a	P2P/Privacy switch buttons support with JavaScript disabled	6 years ago
luccioman	74fd2f30fa	Support for search result switch buttons with JavaScript disabled	6 years ago
luccioman	ebc583cdb2	Properly render the href attribute of the active page button	6 years ago
luccioman	093ea9586c	Properly fill current page number to new server side pagination template When current page is automatically reset to zero because of a new search request.	6 years ago
luccioman	6e9d5f60ad	Server side initial pagination links rendering For better support of the search page usage with JavaScript disabled. Reduces also the number of initial refreshes of the paginations links. When JavaScript is enabled, pagination links are still regularly refreshed until all the search feeds are terminated on server side.	6 years ago
luccioman	4b9cc4746d	Upgraded Bootstrap dependency from v3.3.7 to v3.4.1 Non regressions tested on the following platforms : Linux Debian Stretch : - Firefox 60.5.1esr - Chromium 72.0.3626.96 Windows 10 : - Firefox 65.0.1 - Chrome 72.0.3626.109 - Edge 25.10586.672.0 - IE 11.1540.10586.0 Mac OS : - Safari 11.0	6 years ago
luccioman	c617ea58a0	Render additional embedded audios from links on extended audio search	6 years ago
luccioman	69f1971052	Added basic controls to play all audio results. Not displayed when JavaScript is disabled.	6 years ago
luccioman	9782a98a9c	Added the possibility to customize facets sort type and direction Previously search navigators/facets elements were sorted only by counts. Now from the ConfigSearchPage_p.html admin page, sort direction (ascending/descending) and type (on counts or labels) can be customized independently for each navigator.	6 years ago
sgaebel	c2398fd890	remove warnings: 'Statement unnecessarily nested within else clause'	6 years ago
sgaebel	8d2e7262d9	Recrawl: - set the chunksize to 100 to meet the max of the embedded solr - re-enable sorting (the case where we switched it of should be away) - enable recrawling on remote-solr	6 years ago
luccioman	60b520fb13	Cleaned up Spanish translation after merge of PR #238 * Fixed some indentation * Removed untranslated entries	6 years ago
luccioman	cd72515188	Merge pull request #238 from ivanhercaz/esLang [WIP] Spanish translation	6 years ago
luccioman	2f75e2d9c8	Fixed a case of NullPointerException on disconnected RWI data structure	6 years ago
luccioman	e85f231bdf	Fixed termination of Host browser and link structure Solr query threads On some conditions (especially when reaching timeout), concurrent Solr query tasks used by the /HostBrowser.html and /api/linkstructure.json never terminated, thus leaking resources, as reported by @Vort in issue #246	6 years ago
luccioman	260ac11c65	Limit length of initially visible text in link structure graph nodes To improve a bit readability of graphs having a large number of nodes.	6 years ago
luccioman	5a8d9abd8a	Upgraded d3js dependency from 3.4.4 to 5.7.0	6 years ago
luccioman	9f8e1994a4	Added missing CSS width units to some HostBrowser.html styling	6 years ago
luccioman	0b1d2cb0dd	Fixed "TypeError: table.tBodies[0] is undefined" host browser JS error Traced in browser console when a host details table is empty.	6 years ago
luccioman	fcf6b16db4	Added new crawler attribute for finer control over Media Type detection New "Media Type detection" section in the advanced crawl start page allow to choose between : - not loading URLs with unknown or unsupported file extension without checking the actual Media Type (relying Content-Type header for now). This was the old default behavior, faster, but not really accurate. - always cross check URL file extension against the actual Media Type. This lets properly parse URLs ending with an apparently odd file extension, but which have actually a supported Media Type such as text/html. Sample URLs with misleading file extensions added as documentation in the crawl start page. fixes issue #244	6 years ago
luccioman	88d0ed676c	Render http status instead of null responses on snapshot api errors	6 years ago
luccioman	92e10d7d1c	Added a crawl start hint message on availability or not of wkhtmltopdf As this tool is required to produce pdf snapshots	6 years ago
luccioman	8852c97cee	Added basic styling for cleaner rendering of missing image snapshots For the output of the Solr snapshots writer	6 years ago
luccioman	746e0e788d	Render a relevant HTTP status code on snapshot image rendering error Instead of a null response body which is not very helpful.	6 years ago
luccioman	753bda1409	Fixed remaining blacklist entries improper decoding of '+' character In the blacklist cleaner and import/export administration pages.	6 years ago
luccioman	61c337f29a	Decode blacklist entries for easier edition of non ascii chars Not using the JDK URLDecoder.decode() function, as it strips '+' characters when they occur after '?' (both characters having regular expression semantics when used in blacklist path patterns)	6 years ago
luccioman	ed93221fa1	Improved normalization of blacklist path patterns having non ascii chars Normalize blacklist path patterns using percent-encoding, at pattern edition in web interface and at loading from configuration files. Fixes issue #237	6 years ago
luccioman	d23578efc3	Merge pull request #240 from ivanhercaz/fixEnglishBookmarksPage Fix English Bookmarks.html	6 years ago
ivanhercaz	41684ba559	adding Spanish to the interface language list	6 years ago
ivanhercaz	1dafc85d33	typo fix in Bookmarks.html	6 years ago
luccioman	3d14fb51c5	Removed now unused Java import in addition to modification from PR #239	6 years ago

1 2 3 4 5 ...

6060 Commits (0f66c8bc35c996614abc8d1fead55b70a7c2fdda)