sgaebel
b74f337859
removes double setting of UserAgent
3 years ago
sgaebel
965748fefb
some refactoring using try with resources
3 years ago
Michael Peter Christen
552ab7051b
fix for warc importer
3 years ago
Michael Peter Christen
3c86b7b780
attempt to make a Mac Release using gradle
...
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh
The build is then inside build/mac/YaCy.app
Right now this works so far but it does not have the correct release
number inside.
Target is to make this working for Windows releases and to embedd jre
entirely.
3 years ago
Michael Peter Christen
999c819e3e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
3 years ago
Michael Peter Christen
fd770e90e2
spike to identify paths for YaCy within mac application bundles
3 years ago
Michael Peter Christen
d19872fd26
making sure that crawl queues are closed correctly to prevent data loss
3 years ago
sgaebel
90507c0fdc
comments out printing query params to std.out
3 years ago
Michael Peter Christen
be0aebad84
fixes https://github.com/yacy/yacy_search_server/issues/424
3 years ago
Michael Peter Christen
63ad8ce6b2
removed ymarks
...
had not been used since a long time
4 years ago
Michael Peter Christen
ef5a71a592
enhanced crawl start response time
...
for very very large crawl start lists
4 years ago
Michael Peter Christen
4cadd557dc
removed synchronization in table creation
...
to avoid possible deadlocks when handling OnDemandOpenFileIndex
which happens quite often during wide crawling
4 years ago
admin
9b7668fa58
reduced memory footprint during indexing/crawling
4 years ago
Michael Peter Christen
e6a87e0426
enhanced crawler
...
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
4 years ago
Michael Peter Christen
e9c5e78868
replaced new Number(Number) with Number.instanceOf
...
to remove deprecation warnings for Java 9
4 years ago
Michael Peter Christen
9e13d77de4
removed call to class.finalize() because of deprecation in java 9
...
next: removal of finalize() implementation
after testing with assert false
4 years ago
Michael Peter Christen
9ef4503672
fixed some newInstance() warnings
...
.. by adding .getDeclaredConstructor()
4 years ago
Michael Peter Christen
1d41380f0a
better support for mac-specific tray functions in java 9
4 years ago
Michael Peter Christen
e81b770f79
enabled crawl starts with very large sets of start urls
...
i.e. 10MB large url list with approx 0.5 million start points
4 years ago
Michael Peter Christen
c623a3252e
fix for jdk 14 bug
4 years ago
Michael Peter Christen
dbd211a1ad
removed/replaced reflection in memory tool
4 years ago
Michael Peter Christen
1cdb21592b
added hazelcast and some modifications to align legacy YaCy with
...
YaCyGrid
4 years ago
Michael Christen
42ea2a1c6f
Merge pull request #405 from jfhs/jfhs/support-all-html-entities
...
Improve HTML entities support
4 years ago
Michael Christen
b2af745dd6
Merge pull request #404 from lnceballosz/master
...
NGI0 - Updating licensing aspects according REUSE
4 years ago
jfhs
10bddc2c2d
Decode HTML entities in all property values by default
4 years ago
jfhs
2135d259e3
Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities
4 years ago
Michael Peter Christen
8f876a8c72
added concurrency to enhance indexing speed during json surrogate import
4 years ago
Michael Peter Christen
f8cbaeef93
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
4 years ago
Michael Peter Christen
a857e3d3d5
fix for json importer
4 years ago
sgaebel
1546232c94
adds ranking for multi document queries only
4 years ago
sgaebel
93b353d22d
does not boost or add fields for zero-row-queries (exists())
4 years ago
sgaebel
f16cd154f7
removes unused imports and variables
4 years ago
sgaebel
c69c462a15
replaces a expensive getLoadTimeURL() by exists()
...
refactors urlExists to getHarvestProcess as that is what it does
4 years ago
sgaebel
a5488ac8f5
uses edismax queries on query counts > 1 only
4 years ago
sgaebel
26223dc25a
replaces getLoadTime() by exists() with a simpler query
...
since solr-8.8.1 getLoadTime() causes a high cpu usage
4 years ago
sgaebel
8e4d014c06
removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the
...
log
4 years ago
Lina Ceballos
a96752f5ab
adding SPDX license and copyright headers
4 years ago
Michael Peter Christen
e18d0ef544
trying to set a higher priority to the process that is involved in index
...
export
4 years ago
Michael Peter Christen
8b4394a6c5
fixes for solr 8.8.1 migration
...
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
4 years ago
Michael Peter Christen
ed9789214e
fixed seed initialization problem
4 years ago
Al Sutton
8ade8b8775
Remove forced clear to match new behaviour in 2da71c2a40
4 years ago
Al Sutton
09695fc6d3
Update exceptions to match updated API
4 years ago
Al Sutton
69014a701e
Update API Usage
4 years ago
Michael Peter Christen
3da7628117
use environment variables to overwrite configuration variables
...
you can i.e. do:
export YACY_PORT=8092 && ./startYACY.sh
Just append "YACY_" to uppercase version of environment variables and
replace all "." with "_".
4 years ago
Michael Peter Christen
13a2e6dc6e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
4 years ago
Michael Peter Christen
0ae8ccf657
Make it possible to set an empty password disabling the authentication
...
protocol completely
If you set now an empty password, then the http server will not ask to
authentify. This is required for environment where we attach an outside
authentification service like keycloak or similar using authentication
in an ingress proxy.
This change is part of the approach to run YaCy inside of a kubernetes
cluster where we do not want individual authentication of peers and want
to apply a ingress authentication.
4 years ago
Michael Peter Christen
96592a10cf
added option to set yacy configuration values using environment
...
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
4 years ago
Michael Peter Christen
198826c362
added network scanner process to discover all YaCy peers in the intranet
...
this will be used to wire YaCy peers in a kubernetes cluster
4 years ago
Michael Peter Christen
d9602e8325
Implemented a new syntax in the template engine to simplify json APIs
...
Added also an example for one of the existing APIs. The problem is the
comma separator between objects which must not be there for the last
entry in a sequence. The new syntax adds the separator symbol
automatically.
4 years ago
Michael Peter Christen
5a7f12a9c1
allow network scans for non-standard http/https ports
4 years ago
sgaebel
b8d264f7ec
fixes logging
4 years ago
Michael Peter Christen
4c920d05b5
removed superfluous lines
4 years ago
Michael Peter Christen
907f121d0c
do not overwrite PW with random PW
4 years ago
Michael Peter Christen
3e6a1e0a49
fixed surrogate process counter
4 years ago
Michael Peter Christen
d3526c52af
fixed a problem in warc importer: do not fail if single WARC entries are
...
faulty
4 years ago
Michael Peter Christen
3078b74e1d
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
4 years ago
Michael Peter Christen
01cc32217f
fixed apicall call method parameters
...
and verification in transaction manager
which did not have and exception for localhost/basic authentication
4 years ago
Michael Peter Christen
63f58e4785
enhanced strategy in host browser
...
limit number of fresh hosts in round robin hashes
4 years ago
Michael Peter Christen
9be36800a4
increased redirect depth by one
...
this makes sense if one redirect replaces http with https and another
replaces www subdomain by without (and vice versa)
4 years ago
Michael Peter Christen
d0abb0cedb
enabling all crawl profiles in all network modes
...
also: increased default internet crawl speed to
4 urls/s/host
4 years ago
Michael Peter Christen
baad56d83d
beautified default peer names
4 years ago
Michael Peter Christen
43a9f4f574
updated solr 6.6.6 -> 7.7.3
...
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
4 years ago
Michael Peter Christen
c0d9a3e9a7
turned HostBrowser into a admin-only page, now called IndexBrowser
...
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
4 years ago
Michael Peter Christen
d359d521a1
fixed warc importer
...
The importer tried to import a gziped files as plain warc.
It will now check the file extension and use a unzip automatically
on-the-fly.
4 years ago
Michael Peter Christen
e54ab39958
Going back to basic authentication for console/shell commands
...
This does not affect security because:
- it is going to localhost only
- only users who have already access to the pw hash can do this
- no clear text pw is transmitted because that is not stored anywhere
The switch to basic is required because these commands are required
in the context of hosting on root servers and docker containers
where a password change must be done. But the password shell command
was not working without password which made the concept unusable.
This deficit made it virtually impossible for root server operators
to use YaCy because they had been unable to set up a proper password.
4 years ago
Michael Peter Christen
6271e9122c
javadoc fix
4 years ago
Michael Peter Christen
e0f4e3fd9a
enhanced ability to debug the code
4 years ago
Michael Peter Christen
eea2d71851
prevent creation of auth schema factories every time a servlet is called
4 years ago
Michael Peter Christen
fcc9386ed3
enhanced the (already fast!) png exporter
4 years ago
Michael Peter Christen
4e9b425f98
missing fix for latest commit
4 years ago
Michael Peter Christen
3213d9db37
updated jetty from 9.4.17 to 9.4.35
...
and fixed a bug in ServerSideIncludes that appeared only in that recent
version of jetty
4 years ago
Michael Peter Christen
787fec0658
reduced complexity - removed concurrency in sort
4 years ago
Michael Peter Christen
cef5fde343
adding message to UI to make port change transparent
4 years ago
Michael Peter Christen
52228cb6be
added a gc to cleanup process (once every 10 minutes)
4 years ago
Michael Peter Christen
22841ffbf1
creating a threaddump during every cleanup process
...
to be able to find out what a peer did (not) last time before a crash
4 years ago
Michael Peter Christen
36e616271b
do better documentation on how to set a default password
4 years ago
Michael Peter Christen
df2bf9ef28
try to fix maven build error
4 years ago
Michael Peter Christen
264bab6700
trying to fight the UI unavaiability
...
this path addresses a possible issue with too many open connections to
remote peers
4 years ago
Michael Peter Christen
7947baeb49
removed all remaining deprecation warnings
4 years ago
Michael Peter Christen
c0f6d6e11d
removed one deprecation warning for jetty library initializing ssl
...
server port
4 years ago
Michael Peter Christen
133440a7a6
some debug lines
4 years ago
sgaebel
3431f91db9
removes unused 'unused' tokens
5 years ago
sgaebel
fc03c4b4fe
removes some warning and unused objects
5 years ago
sgaebel
4a495df63a
removes some deprecation-warnings
5 years ago
sgaebel
dd9d4b1188
replace org.junit.Assert.assertThat by
...
org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid
deprecation-warning
5 years ago
sgaebel
df9ea0a42a
removes some warnings: unused imports, params
5 years ago
sgaebel
9bc2297161
fixes deleting during recrawl
5 years ago
sgaebel
80785b785e
adds deleting during recrawl
5 years ago
Michael Peter Christen
e0ad8ca9da
replaced json library from JSON.org with libandroid-json-java
...
This fixes https://github.com/yacy/yacy_search_server/issues/347
5 years ago
Michael Peter Christen
ea8df27e95
modified org.json.* library to fit into the YaCy environment
...
as drop-in replacement.
Also made some fixes and enhancements to the library.
5 years ago
Michael Peter Christen
60dc1241a3
added org.json.* library
...
from https://android.googlesource.com/platform/libcore/+/refs/heads/master/json/src/main/java/org/json
as a preparation step for
https://github.com/yacy/yacy_search_server/issues/347
5 years ago
Michael Peter Christen
053e54a2c7
grand CORS for json files
5 years ago
Michael Christen
cfa27d2fd5
fixed links
5 years ago
Michael Christen
cb20aa7e54
removed donation message in search result column
5 years ago
Michael Christen
25227676ae
removed some warnings
6 years ago
luccioman
6b45cd5799
New optional crawl filter on the URL a doc must match to crawl its links
...
For finer control over which parsed documents can trigger an addition of
their links to the crawl stack, complementary to the existing crawl
depth parameter.
6 years ago
luccioman
d16bc99835
Added "Show Metadata" links to the ViewFile.html links mode
...
To conveniently follow parsed links in the file viewer
6 years ago
luccioman
a5771b1f14
Made SNI extension user configurable without the need for server restart
...
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.
Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
6 years ago
luccioman
e90405b6f0
Support parsing audio URLs without file extension
...
Added also a Junit for the audio tag parser
6 years ago
luccioman
a8316c79da
Allow JS resorting of search results by unauthenticated users
...
Acces rate limitations to this search mode by unauthenticated users are
set low by default to prevent unwanted server overload but can be
customized through the SearchAccessRate_p.html configuration page
Fixes #291
6 years ago