yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	8eb0d490aa	migrated solr to 9.0 This is a major step because solr removed support for embedded solr instances in 9.0 and we want to keep it because we want to ship YaCy with an embedded solr. It was necessary to add parts of solr code into YaCy to make this migration possible. Further on with Solr 9.1 they removed even more parts which are required for embedded operation, therefore we cannot migrate yet further without big changes. If you are running a YaCy instance with Solr 8.x, the migration should be done automatically. If not you require to first migrate to a YaCy version 1.93 with Solr 8.x to migrate to Solr 8 data.	6 months ago
Michael Peter Christen	34a9fc1a07	bugfixes to zim reader:	1 year ago
Michael Peter Christen	7db0534d8a	Added a zim parser to the surrogate import option. You can now import zim files into YaCy by simply moving them to the DATA/SURROGATE/IN folder. They will be fetched and after parsing moved to DATA/SURROGATE/OUT. There are exceptions where the parser is not able to identify the original URL of the documents in the zim file. In that case the file is simply ignored. This commit also carries an important fix to the pdf parser and an increase of the maximum parsing speed to 60000 PPM which should make it possible to index up to 1000 files in one second.	1 year ago
Michael Peter Christen	70e29937ef	added a check in zim importer which tests if import URLs actually exist	1 year ago
Michael Peter Christen	5ba5fb5d23	upgraded pdfbox to 3.0.0	1 year ago
mchristen	8fc51f66c6	fixed a test class which prevented compilation on latest jvm	1 year ago
Michael Peter Christen	8285fe715a	tab to spaces for classes supporting the condenser. This is a preparation step to make changes in condenser and parser more visible; no functional changes so far.	1 year ago
Michael Peter Christen	5afcba162b	updated libraries	1 year ago
Michael Christen	9012fe4519	extended error message	2 years ago
Michael Christen	74104ff2d3	fix to timeout	2 years ago
Michael Peter Christen	ab3ef87abf	fixed exec start command where a path contains spaces	2 years ago
Michael Peter Christen	5ddc794bb9	code cleanup in http clieant	2 years ago
Michael Christen	61b27217b9	throttle number of DNS requests: as soon as the number of requests is > 50, there is a forced delay of (10 * (requests - 50)) milliseconds. That means that once the number of DNS requests reach 150, there is a one second delay to each request. This shall prevent that a remote DNS is flooded with request and possibly gets damaged. This is also a fix/enhancement for https://github.com/yacy/yacy_search_server/issues/513	2 years ago
Michael Peter Christen	761dbdf06d	increases log history length to 10000 implements https://github.com/yacy/yacy_search_server/issues/512	2 years ago
Michael Peter Christen	0970a79bbf	attempt to fix https://github.com/yacy/yacy_search_server/issues/517	2 years ago
Michael Peter Christen	1893661ee4	removed/suppressed more warnings	2 years ago
Michael Christen	867f96a32b	removed warnings	2 years ago
Michael Christen	8a06beaf24	removed finalize() methods, deprecated	2 years ago
Michael Peter Christen	fc98ca7a9c	removed ContentControl servlet and functinality This was not used at all (as I know) and was blocking a smooth integration of ivy in the context of an existing JSON parser.	2 years ago
reger24	141e86964e	Fix compile deprecation warning warning: [removal] AccessControlException in java.security has been deprecated and marked for removal	3 years ago
Michael Peter Christen	39e7bbac13	removed deprecation warning for new Double()	3 years ago
Daleth Darko	3ced06c731	Various javadoc fixes	3 years ago
Michael Peter Christen	bd3f2483a1	replaced url and date retrieval by only url retrieval This should prevent that the search index is used for freshnes of the index entry.	3 years ago
Michael Peter Christen	163ba26d90	replaced check for load time method instead of loading the solr document, an index only for the last loading time was created. This prevents that solr has to fetch from its index while the index is created. Excessive re-loading of documents while indexing has shown to produce deadlocks, so this should now be prevented.	3 years ago
Michael Peter Christen	59777010dc	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	3 years ago
Michael Peter Christen	7898815c41	disabling concurrent logging (maybe temporary)	3 years ago
sgaebel	4bf6954474	uses clientBuilder not HttpClients.custom() to have these inside the Pool too	3 years ago
sgaebel	cdf901270c	always use HTTPClient by 'try with resources' pattern to free up resources	3 years ago
sgaebel	69adaa9f55	makes our HTTPClient closable	3 years ago
sgaebel	fc4275f901	handle all references for client, response, request to be able to close them	3 years ago
sgaebel	e7d3a363f2	refactor to use finish()	3 years ago
sgaebel	4fc876f4a3	revert back to use EntityUtils.consumeQuietly - as it simply closes the underlying stream	3 years ago
sgaebel	4f0392e93e	refactor use of AuthSchemeProvider	3 years ago
sgaebel	b74f337859	removes double setting of UserAgent	3 years ago
sgaebel	965748fefb	some refactoring using try with resources	3 years ago
sgaebel	90507c0fdc	comments out printing query params to std.out	3 years ago
Michael Peter Christen	be0aebad84	fixes https://github.com/yacy/yacy_search_server/issues/424	3 years ago
Michael Peter Christen	e6a87e0426	enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader.	3 years ago
Michael Peter Christen	e9c5e78868	replaced new Number(Number) with Number.instanceOf to remove deprecation warnings for Java 9	3 years ago
Michael Peter Christen	9ef4503672	fixed some newInstance() warnings .. by adding .getDeclaredConstructor()	3 years ago
Michael Peter Christen	c623a3252e	fix for jdk 14 bug	4 years ago
Michael Peter Christen	dbd211a1ad	removed/replaced reflection in memory tool	4 years ago
Michael Peter Christen	1cdb21592b	added hazelcast and some modifications to align legacy YaCy with YaCyGrid	4 years ago
Michael Peter Christen	f8cbaeef93	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	a857e3d3d5	fix for json importer	4 years ago
sgaebel	f16cd154f7	removes unused imports and variables	4 years ago
sgaebel	a5488ac8f5	uses edismax queries on query counts > 1 only	4 years ago
sgaebel	26223dc25a	replaces getLoadTime() by exists() with a simpler query since solr-8.8.1 getLoadTime() causes a high cpu usage	4 years ago
Michael Peter Christen	e18d0ef544	trying to set a higher priority to the process that is involved in index export	4 years ago
Michael Peter Christen	8b4394a6c5	fixes for solr 8.8.1 migration - replace new guava 30 with older 25 because that is the correct dependency for solr 8.8.1. The newer one did actually not work! - index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1 subfolder. The older solr_6_6 index is not touched but also not migrated. The index starts with fresh (empty) content. - Older indexes must be migrated by hand (export/import) so far until a better solution is found. - Large schema adoptions for lucene 8.8.1	4 years ago

1 2 3 4 5 ...

1385 Commits (fe4c0aa890599cbd60250d9476f527c24da38bb6)