- added multiproject in libbuild to compile and install all needed jars (reproduceing old maven build)
- adjusted GitComInf (use a project version, build script with less hardcoded strings)
- adjusted J7Zip-modified
- include common version number for output jar
- add task installJarToRoot to copy output jar to yacycore /lib
- adjust main build with updated jar names
as files are the same - updated also old build.xml
And the Eclipse specific .classpath (with shall be deleted until complete move to gradle)
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1