yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	8eb0d490aa	migrated solr to 9.0 This is a major step because solr removed support for embedded solr instances in 9.0 and we want to keep it because we want to ship YaCy with an embedded solr. It was necessary to add parts of solr code into YaCy to make this migration possible. Further on with Solr 9.1 they removed even more parts which are required for embedded operation, therefore we cannot migrate yet further without big changes. If you are running a YaCy instance with Solr 8.x, the migration should be done automatically. If not you require to first migrate to a YaCy version 1.93 with Solr 8.x to migrate to Solr 8 data.	6 months ago
Michael Peter Christen	7db0534d8a	Added a zim parser to the surrogate import option. You can now import zim files into YaCy by simply moving them to the DATA/SURROGATE/IN folder. They will be fetched and after parsing moved to DATA/SURROGATE/OUT. There are exceptions where the parser is not able to identify the original URL of the documents in the zim file. In that case the file is simply ignored. This commit also carries an important fix to the pdf parser and an increase of the maximum parsing speed to 60000 PPM which should make it possible to index up to 1000 files in one second.	1 year ago
Michael Peter Christen	4a54b24703	fix for "negative seek offset" error during extension of heap files. This would have always happend when a heap file exceeds 2GB. should fix https://github.com/yacy/yacy_search_server/issues/372	1 year ago
Michael Peter Christen	0089f234f4	added npe protection	1 year ago
Michael Peter Christen	8285fe715a	tab to spaces for classes supporting the condenser. This is a preparation step to make changes in condenser and parser more visible; no functional changes so far.	1 year ago
Michael Peter Christen	ab3ef87abf	fixed exec start command where a path contains spaces	2 years ago
Michael Peter Christen	761dbdf06d	increases log history length to 10000 implements https://github.com/yacy/yacy_search_server/issues/512	2 years ago
Michael Peter Christen	1893661ee4	removed/suppressed more warnings	2 years ago
Michael Christen	8a06beaf24	removed finalize() methods, deprecated	2 years ago
Michael Peter Christen	60c9986a0e	new release file names with date and git hash ...without reference to 9000ish SVN	2 years ago
reger24	18dddb74c9	Harmonize loading/reading blacklist between init and servlet to use the same procedures -added BlacklistHelper.blacklistToSortedArray to simplify use in servlet	3 years ago
Daleth Darko	3ced06c731	Various javadoc fixes	3 years ago
Michael Peter Christen	bd3f2483a1	replaced url and date retrieval by only url retrieval This should prevent that the search index is used for freshnes of the index entry.	3 years ago
Michael Peter Christen	d19872fd26	making sure that crawl queues are closed correctly to prevent data loss	3 years ago
Michael Peter Christen	63ad8ce6b2	removed ymarks had not been used since a long time	3 years ago
Michael Peter Christen	4cadd557dc	removed synchronization in table creation to avoid possible deadlocks when handling OnDemandOpenFileIndex which happens quite often during wide crawling	3 years ago
admin	9b7668fa58	reduced memory footprint during indexing/crawling	3 years ago
Michael Peter Christen	e9c5e78868	replaced new Number(Number) with Number.instanceOf to remove deprecation warnings for Java 9	3 years ago
Michael Peter Christen	9e13d77de4	removed call to class.finalize() because of deprecation in java 9 next: removal of finalize() implementation after testing with assert false	3 years ago
Michael Peter Christen	9ef4503672	fixed some newInstance() warnings .. by adding .getDeclaredConstructor()	3 years ago
Michael Peter Christen	1cdb21592b	added hazelcast and some modifications to align legacy YaCy with YaCyGrid	4 years ago
jfhs	10bddc2c2d	Decode HTML entities in all property values by default	4 years ago
sgaebel	f16cd154f7	removes unused imports and variables	4 years ago
Michael Peter Christen	e18d0ef544	trying to set a higher priority to the process that is involved in index export	4 years ago
Michael Peter Christen	787fec0658	reduced complexity - removed concurrency in sort	4 years ago
Michael Peter Christen	22841ffbf1	creating a threaddump during every cleanup process to be able to find out what a peer did (not) last time before a crash	4 years ago
Michael Peter Christen	133440a7a6	some debug lines	4 years ago
sgaebel	df9ea0a42a	removes some warnings: unused imports, params	4 years ago
luccioman	08ea0b0397	Added a configurable timeout to wkhtmltopdf calls for pdf snapshots Necessary to prevent blocking the indexing workflow when some wkhtmltopdf renderings fail without terminating	6 years ago
luccioman	2bdd71de60	Added server side columns sorting on the Process Scheduler table For easier usage of large tables in the Table_API_p.html page.	6 years ago
luccioman	bb51555830	Removed remaining unsafe accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	e97580dfc7	Fixed unsafe conccurent access to generic SimpleDateFormat instances SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	fa4399d5d2	Small perf improvement : initialize threads names early when possible Initializing Thread names using the Thread constructor parameter is faster as it already sets a thread name even if no customized one is given, while an additional call to the Thread.setName() function internally do synchronized access, eventually runs access check on the security manager and performs a native call. Profiling a running YaCy server revealed that the total processing time spent on Thread.setName() for a typical p2p search was in the range of seconds.	7 years ago
luccioman	addd18c993	Removed some remaining uses of deprecated Seed.getIP()	7 years ago
luccioman	bcbd0ae1a4	Enabled partial parsing of audio resources.	7 years ago
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	6cd3847d0a	Fixed NullPointerException case on Table init with relative file path. Can occur for example when running dbtest with relative test table file name (wihout explicit parent folder).	7 years ago
luccioman	9ddf92d143	Removed unncessary reflection usage for workflow tasks. This improves code readability and maintainability (calls hierarchy are easier to read) and eventually performance.	7 years ago
luccioman	6425963cee	Fixed internal tables exact value match iterator	7 years ago
luccioman	36e9b1c5b3	Fixed SegmentTest test case time dependant occasional failures As highlighted by latest automated Travis builds.	7 years ago
luccioman	938d8a9731	Added some JavaDoc	7 years ago
luccioman	6e497241f7	Properly close resources (even on error) on OS and ThreadDump classes. Also updated some JavaDoc and main() function usage message on the same ones.	7 years ago
luccioman	dd9cb06d25	Fixed RWI distance calculation on multi words search queries. Distance was lost when storing/retrieving references to intermediate result container. Now all JUnit tests are again successfully passing!	7 years ago
luccioman	5d3ceb31b7	Improved search navigators counters accuracy and consistency. - added some missing increments from RWI results - decrement relevant navigator counts when solr or RWI results are evicted because duplicates detection or constraints checked belatedly - do not compute facets when unnecessary to avoid unwanted CPU load - do not increment from facets when already done - do not rely on facets on remote solr peers requests, as most of the time only a limited part of their total results if fetched (thus also preventing unnecessary load on remote peers) - use a concurrency friendly score map for the dates navigators to prevent unwanted ConcurrentModificationExceptions This improves the situation for the most obvious inconsistencies in search navigators counts, but more has to be done for a true accuracy (notably when query modifiers constraints are applied belatedly - after the solr or RWI retrieval request - such as the content domain constraint)	7 years ago
luccioman	8e4f31bdc7	Updated internal ISO 639-1 language codes with latest standards. Includes 54 language code additions, some name modifications, and marking a few deprecated.	7 years ago
luccioman	4eba88f2ff	Removed some unnecessary uses of java.lang.reflect api. This improves code browsing and readability, making search by references or call hierarchy IDE features more accurate.	7 years ago
luccioman	b23a563065	Prevent search result failure on incomplete images information. Complements the recent modification related to images in commit `7f395ef`. Unfortunately many documents metadata fetched from the freeworld p2p network have only partial information about embedded images. Without proper error handling, this made many searches in p2p mode to fail completely.	7 years ago
Michael Peter Christen	7f395ef937	added image link in search results This should be a help to make a preview of search results. The image is computed from the list of embedded images, it is always the first image in that list. In rss-type results the image is presented like <media:content medium="image" url="https://abc.xyz/logo.png"/> as defined in http://www.rssboard.org/media-rss#media-content	7 years ago
luccioman	f369679d1c	Fixed read/copy on input streams reading sometimes less than expected.	7 years ago
luccioman	bf55f1d6e5	Started support of partial parsing on large streamed resources. Thus enable getpageinfo_p API to return something in a reasonable amount of time on resources over MegaBytes size range. Support added first with the generic XML parser, for other formats regular crawler limits apply as usual.	7 years ago

1 2 3 4 5 ...

909 Commits (b8479430b693107b32ede29d22a6a35e9a7c63f8)