Michael Peter Christen
d7b17d8935
fixed missing thread name revert after balancer waiting
3 years ago
Michael Peter Christen
bd3f2483a1
replaced url and date retrieval by only url retrieval
...
This should prevent that the search index is used for freshnes of the
index entry.
3 years ago
Michael Peter Christen
163ba26d90
replaced check for load time method
...
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
3 years ago
Michael Peter Christen
1ead7b85b5
remove compiler warning
...
"warning: [try] explicit call to close() on an auto-closeable resource"
3 years ago
Michael Peter Christen
59777010dc
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
3 years ago
Michael Peter Christen
7898815c41
disabling concurrent logging
...
(maybe temporary)
3 years ago
sgaebel
4bf6954474
uses clientBuilder not HttpClients.custom() to have these inside the
...
Pool too
3 years ago
sgaebel
cdf901270c
always use HTTPClient by 'try with resources' pattern to free up
...
resources
3 years ago
sgaebel
69adaa9f55
makes our HTTPClient closable
3 years ago
sgaebel
fc4275f901
handle all references for client, response, request to be able to close
...
them
3 years ago
sgaebel
e7d3a363f2
refactor to use finish()
3 years ago
sgaebel
4fc876f4a3
revert back to use EntityUtils.consumeQuietly - as it simply closes the
...
underlying stream
3 years ago
sgaebel
4f0392e93e
refactor use of AuthSchemeProvider
3 years ago
sgaebel
b74f337859
removes double setting of UserAgent
3 years ago
sgaebel
965748fefb
some refactoring using try with resources
3 years ago
Michael Peter Christen
552ab7051b
fix for warc importer
3 years ago
Michael Peter Christen
3c86b7b780
attempt to make a Mac Release using gradle
...
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh
The build is then inside build/mac/YaCy.app
Right now this works so far but it does not have the correct release
number inside.
Target is to make this working for Windows releases and to embedd jre
entirely.
3 years ago
Michael Peter Christen
999c819e3e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
3 years ago
Michael Peter Christen
fd770e90e2
spike to identify paths for YaCy within mac application bundles
3 years ago
Michael Peter Christen
d19872fd26
making sure that crawl queues are closed correctly to prevent data loss
3 years ago
sgaebel
90507c0fdc
comments out printing query params to std.out
3 years ago
Michael Peter Christen
be0aebad84
fixes https://github.com/yacy/yacy_search_server/issues/424
3 years ago
Michael Peter Christen
63ad8ce6b2
removed ymarks
...
had not been used since a long time
3 years ago
Michael Peter Christen
ef5a71a592
enhanced crawl start response time
...
for very very large crawl start lists
3 years ago
Michael Peter Christen
4cadd557dc
removed synchronization in table creation
...
to avoid possible deadlocks when handling OnDemandOpenFileIndex
which happens quite often during wide crawling
3 years ago
admin
9b7668fa58
reduced memory footprint during indexing/crawling
3 years ago
Michael Peter Christen
e6a87e0426
enhanced crawler
...
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
4 years ago
Michael Peter Christen
e9c5e78868
replaced new Number(Number) with Number.instanceOf
...
to remove deprecation warnings for Java 9
4 years ago
Michael Peter Christen
9e13d77de4
removed call to class.finalize() because of deprecation in java 9
...
next: removal of finalize() implementation
after testing with assert false
4 years ago
Michael Peter Christen
9ef4503672
fixed some newInstance() warnings
...
.. by adding .getDeclaredConstructor()
4 years ago
Michael Peter Christen
1d41380f0a
better support for mac-specific tray functions in java 9
4 years ago
Michael Peter Christen
e81b770f79
enabled crawl starts with very large sets of start urls
...
i.e. 10MB large url list with approx 0.5 million start points
4 years ago
Michael Peter Christen
c623a3252e
fix for jdk 14 bug
4 years ago
Michael Peter Christen
dbd211a1ad
removed/replaced reflection in memory tool
4 years ago
Michael Peter Christen
1cdb21592b
added hazelcast and some modifications to align legacy YaCy with
...
YaCyGrid
4 years ago
Michael Christen
42ea2a1c6f
Merge pull request #405 from jfhs/jfhs/support-all-html-entities
...
Improve HTML entities support
4 years ago
Michael Christen
b2af745dd6
Merge pull request #404 from lnceballosz/master
...
NGI0 - Updating licensing aspects according REUSE
4 years ago
jfhs
10bddc2c2d
Decode HTML entities in all property values by default
4 years ago
jfhs
2135d259e3
Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities
4 years ago
Michael Peter Christen
8f876a8c72
added concurrency to enhance indexing speed during json surrogate import
4 years ago
Michael Peter Christen
f8cbaeef93
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
4 years ago
Michael Peter Christen
a857e3d3d5
fix for json importer
4 years ago
sgaebel
1546232c94
adds ranking for multi document queries only
4 years ago
sgaebel
93b353d22d
does not boost or add fields for zero-row-queries (exists())
4 years ago
sgaebel
f16cd154f7
removes unused imports and variables
4 years ago
sgaebel
c69c462a15
replaces a expensive getLoadTimeURL() by exists()
...
refactors urlExists to getHarvestProcess as that is what it does
4 years ago
sgaebel
a5488ac8f5
uses edismax queries on query counts > 1 only
4 years ago
sgaebel
26223dc25a
replaces getLoadTime() by exists() with a simpler query
...
since solr-8.8.1 getLoadTime() causes a high cpu usage
4 years ago
sgaebel
8e4d014c06
removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the
...
log
4 years ago
Lina Ceballos
a96752f5ab
adding SPDX license and copyright headers
4 years ago