yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	ef5a71a592	enhanced crawl start response time for very very large crawl start lists	4 years ago
Michael Peter Christen	4cadd557dc	removed synchronization in table creation to avoid possible deadlocks when handling OnDemandOpenFileIndex which happens quite often during wide crawling	4 years ago
admin	9b7668fa58	reduced memory footprint during indexing/crawling	4 years ago
Michael Peter Christen	e6a87e0426	enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader.	4 years ago
Michael Peter Christen	e9c5e78868	replaced new Number(Number) with Number.instanceOf to remove deprecation warnings for Java 9	4 years ago
Michael Peter Christen	9e13d77de4	removed call to class.finalize() because of deprecation in java 9 next: removal of finalize() implementation after testing with assert false	4 years ago
Michael Peter Christen	9ef4503672	fixed some newInstance() warnings .. by adding .getDeclaredConstructor()	4 years ago
Michael Peter Christen	1d41380f0a	better support for mac-specific tray functions in java 9	4 years ago
Michael Peter Christen	e81b770f79	enabled crawl starts with very large sets of start urls i.e. 10MB large url list with approx 0.5 million start points	4 years ago
Michael Peter Christen	c623a3252e	fix for jdk 14 bug	4 years ago
Michael Peter Christen	dbd211a1ad	removed/replaced reflection in memory tool	4 years ago
Michael Peter Christen	1cdb21592b	added hazelcast and some modifications to align legacy YaCy with YaCyGrid	4 years ago
Michael Christen	42ea2a1c6f	Merge pull request #405 from jfhs/jfhs/support-all-html-entities Improve HTML entities support	4 years ago
Michael Christen	b2af745dd6	Merge pull request #404 from lnceballosz/master NGI0 - Updating licensing aspects according REUSE	4 years ago
jfhs	10bddc2c2d	Decode HTML entities in all property values by default	4 years ago
jfhs	2135d259e3	Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities	4 years ago
Michael Peter Christen	8f876a8c72	added concurrency to enhance indexing speed during json surrogate import	4 years ago
Michael Peter Christen	f8cbaeef93	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	a857e3d3d5	fix for json importer	4 years ago
sgaebel	1546232c94	adds ranking for multi document queries only	4 years ago
sgaebel	93b353d22d	does not boost or add fields for zero-row-queries (exists())	4 years ago
sgaebel	f16cd154f7	removes unused imports and variables	4 years ago
sgaebel	c69c462a15	replaces a expensive getLoadTimeURL() by exists() refactors urlExists to getHarvestProcess as that is what it does	4 years ago
sgaebel	a5488ac8f5	uses edismax queries on query counts > 1 only	4 years ago
sgaebel	26223dc25a	replaces getLoadTime() by exists() with a simpler query since solr-8.8.1 getLoadTime() causes a high cpu usage	4 years ago
sgaebel	8e4d014c06	removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the log	4 years ago
Lina Ceballos	a96752f5ab	adding SPDX license and copyright headers	4 years ago
Michael Peter Christen	e18d0ef544	trying to set a higher priority to the process that is involved in index export	4 years ago
Michael Peter Christen	8b4394a6c5	fixes for solr 8.8.1 migration - replace new guava 30 with older 25 because that is the correct dependency for solr 8.8.1. The newer one did actually not work! - index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1 subfolder. The older solr_6_6 index is not touched but also not migrated. The index starts with fresh (empty) content. - Older indexes must be migrated by hand (export/import) so far until a better solution is found. - Large schema adoptions for lucene 8.8.1	4 years ago
Michael Peter Christen	ed9789214e	fixed seed initialization problem	4 years ago
Al Sutton	8ade8b8775	Remove forced clear to match new behaviour in `2da71c2a40`	4 years ago
Al Sutton	09695fc6d3	Update exceptions to match updated API	4 years ago
Al Sutton	69014a701e	Update API Usage	4 years ago
Michael Peter Christen	3da7628117	use environment variables to overwrite configuration variables you can i.e. do: export YACY_PORT=8092 && ./startYACY.sh Just append "YACY_" to uppercase version of environment variables and replace all "." with "_".	4 years ago
Michael Peter Christen	13a2e6dc6e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	0ae8ccf657	Make it possible to set an empty password disabling the authentication protocol completely If you set now an empty password, then the http server will not ask to authentify. This is required for environment where we attach an outside authentification service like keycloak or similar using authentication in an ingress proxy. This change is part of the approach to run YaCy inside of a kubernetes cluster where we do not want individual authentication of peers and want to apply a ingress authentication.	4 years ago
Michael Peter Christen	96592a10cf	added option to set yacy configuration values using environment variables To use that feature, set an environment variable with prefix "yacy." and suffix identical to the yacy configuration attribute name. Additionaly we implemented a way to set a peer name using the setting "network.unit.agent". This can therefore now be used to set a peer name with the java call parameter -Dyacy.network.unit.agent=anonymous The purpose for this feature is the ability to set peer names in mass-deployed kubernetes clusters to the same name to prevent that we are flooding peer name statistics with auto-deployment-generated names.	4 years ago
Michael Peter Christen	198826c362	added network scanner process to discover all YaCy peers in the intranet this will be used to wire YaCy peers in a kubernetes cluster	4 years ago
Michael Peter Christen	d9602e8325	Implemented a new syntax in the template engine to simplify json APIs Added also an example for one of the existing APIs. The problem is the comma separator between objects which must not be there for the last entry in a sequence. The new syntax adds the separator symbol automatically.	4 years ago
Michael Peter Christen	5a7f12a9c1	allow network scans for non-standard http/https ports	4 years ago
sgaebel	b8d264f7ec	fixes logging	4 years ago
Michael Peter Christen	4c920d05b5	removed superfluous lines	4 years ago
Michael Peter Christen	907f121d0c	do not overwrite PW with random PW	4 years ago
Michael Peter Christen	3e6a1e0a49	fixed surrogate process counter	4 years ago
Michael Peter Christen	d3526c52af	fixed a problem in warc importer: do not fail if single WARC entries are faulty	4 years ago
Michael Peter Christen	3078b74e1d	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	01cc32217f	fixed apicall call method parameters and verification in transaction manager which did not have and exception for localhost/basic authentication	4 years ago
Michael Peter Christen	63f58e4785	enhanced strategy in host browser limit number of fresh hosts in round robin hashes	4 years ago
Michael Peter Christen	9be36800a4	increased redirect depth by one this makes sense if one redirect replaces http with https and another replaces www subdomain by without (and vice versa)	4 years ago
Michael Peter Christen	d0abb0cedb	enabling all crawl profiles in all network modes also: increased default internet crawl speed to 4 urls/s/host	4 years ago

1 2 3 4 5 ...

4372 Commits (ef5a71a592319f8f13633a1dda45add3179c4820)