Michael Peter Christen
bd3f2483a1
replaced url and date retrieval by only url retrieval
...
This should prevent that the search index is used for freshnes of the
index entry.
3 years ago
Michael Peter Christen
163ba26d90
replaced check for load time method
...
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
3 years ago
Michael Peter Christen
59777010dc
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
3 years ago
Michael Peter Christen
7898815c41
disabling concurrent logging
...
(maybe temporary)
3 years ago
sgaebel
4bf6954474
uses clientBuilder not HttpClients.custom() to have these inside the
...
Pool too
3 years ago
sgaebel
cdf901270c
always use HTTPClient by 'try with resources' pattern to free up
...
resources
3 years ago
sgaebel
69adaa9f55
makes our HTTPClient closable
3 years ago
sgaebel
fc4275f901
handle all references for client, response, request to be able to close
...
them
3 years ago
sgaebel
e7d3a363f2
refactor to use finish()
3 years ago
sgaebel
4fc876f4a3
revert back to use EntityUtils.consumeQuietly - as it simply closes the
...
underlying stream
3 years ago
sgaebel
4f0392e93e
refactor use of AuthSchemeProvider
3 years ago
sgaebel
b74f337859
removes double setting of UserAgent
3 years ago
sgaebel
965748fefb
some refactoring using try with resources
3 years ago
sgaebel
90507c0fdc
comments out printing query params to std.out
3 years ago
Michael Peter Christen
be0aebad84
fixes https://github.com/yacy/yacy_search_server/issues/424
3 years ago
Michael Peter Christen
e6a87e0426
enhanced crawler
...
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
4 years ago
Michael Peter Christen
e9c5e78868
replaced new Number(Number) with Number.instanceOf
...
to remove deprecation warnings for Java 9
4 years ago
Michael Peter Christen
9ef4503672
fixed some newInstance() warnings
...
.. by adding .getDeclaredConstructor()
4 years ago
Michael Peter Christen
c623a3252e
fix for jdk 14 bug
4 years ago
Michael Peter Christen
dbd211a1ad
removed/replaced reflection in memory tool
4 years ago
Michael Peter Christen
1cdb21592b
added hazelcast and some modifications to align legacy YaCy with
...
YaCyGrid
4 years ago
Michael Peter Christen
f8cbaeef93
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
4 years ago
Michael Peter Christen
a857e3d3d5
fix for json importer
4 years ago
sgaebel
f16cd154f7
removes unused imports and variables
4 years ago
sgaebel
a5488ac8f5
uses edismax queries on query counts > 1 only
4 years ago
sgaebel
26223dc25a
replaces getLoadTime() by exists() with a simpler query
...
since solr-8.8.1 getLoadTime() causes a high cpu usage
4 years ago
Michael Peter Christen
e18d0ef544
trying to set a higher priority to the process that is involved in index
...
export
4 years ago
Michael Peter Christen
8b4394a6c5
fixes for solr 8.8.1 migration
...
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
4 years ago
Al Sutton
8ade8b8775
Remove forced clear to match new behaviour in 2da71c2a40
4 years ago
Al Sutton
09695fc6d3
Update exceptions to match updated API
4 years ago
Al Sutton
69014a701e
Update API Usage
4 years ago
Michael Peter Christen
198826c362
added network scanner process to discover all YaCy peers in the intranet
...
this will be used to wire YaCy peers in a kubernetes cluster
4 years ago
Michael Peter Christen
5a7f12a9c1
allow network scans for non-standard http/https ports
4 years ago
Michael Peter Christen
d0abb0cedb
enabling all crawl profiles in all network modes
...
also: increased default internet crawl speed to
4 urls/s/host
4 years ago
Michael Peter Christen
43a9f4f574
updated solr 6.6.6 -> 7.7.3
...
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
4 years ago
Michael Peter Christen
eea2d71851
prevent creation of auth schema factories every time a servlet is called
4 years ago
Michael Peter Christen
787fec0658
reduced complexity - removed concurrency in sort
4 years ago
Michael Peter Christen
36e616271b
do better documentation on how to set a default password
4 years ago
Michael Peter Christen
df2bf9ef28
try to fix maven build error
4 years ago
Michael Peter Christen
7947baeb49
removed all remaining deprecation warnings
4 years ago
sgaebel
4a495df63a
removes some deprecation-warnings
5 years ago
sgaebel
df9ea0a42a
removes some warnings: unused imports, params
5 years ago
Michael Peter Christen
e0ad8ca9da
replaced json library from JSON.org with libandroid-json-java
...
This fixes https://github.com/yacy/yacy_search_server/issues/347
5 years ago
Michael Christen
25227676ae
removed some warnings
5 years ago
luccioman
d16bc99835
Added "Show Metadata" links to the ViewFile.html links mode
...
To conveniently follow parsed links in the file viewer
6 years ago
luccioman
a5771b1f14
Made SNI extension user configurable without the need for server restart
...
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.
Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
6 years ago
luccioman
5b7e41202a
Added Solr GSA writer support for responses from remote instances
6 years ago
luccioman
4d8a948455
Properly close PDF snapshots loaded with pdfbox library
6 years ago
luccioman
74e6d6e984
Added Solr GrepHTML writer support for responses from remote instances
6 years ago
luccioman
5e6501974d
Added Solr snapshots writer support for responses from remote instances
6 years ago