The https://reproducible-builds.org project invests a lot of work
to make builds reproducible. This is a security property. It allows
to compare the build of binaries from different builder machines.
If they are identical, it means that either the builds have not
been manipulated or an attacker managed to attack all builder
machines in exactly the same way.
One problem that the reproducible-builds project often sees is
that projects include the build time in their binaries. This
makes builds unreproducible for apparently no reason. The build
date should not be of interest since binaries built on different
dates but from the same source code should not be different.
Thus I decided to remove the build date instead of re-implementing
the functionality without the GitRev task. Anyways the reported
date was not the build date but the date of the last git commit
which is even less informative. The git commit ID would have
information value but should only be relevant for "nightly builds".
PKGMANAGER is always false, thus the java code wrapped in
if statements for this property is dead code and can also
be removed.
The Debian packaging removed in c4659f0fb0
did set the PKGMANAGER property to true. When we do distro
packages again, we can revisit this commit and redo it with
property files instead.
RESTARTCMD is only used inside those dead code.
DESTDIR is never used even in the build.xml
*.epub files are zip files containing xhtml files with content and other artifact files,
which the zipParser can already feed to index
- extension "epub"
- mime "epub+zip"
to be able to use/reuse Ant targets where task has not been implemented in Gradle build.
- use the import to include the compile of htroot as first important task
! it is possible that first build fails an compile of GitRevTask.jar !
! solution/workaround -> use "ant all" once to compile GitRevTask.jar !
- adjusted build.xml a little
- split compile-core into compile-core and compile-htroot to have a target for htroot comp. only
- set build-path to reuse Gradles build directory
- (fix javadoc failure)
- changed the filtered-copy of yacyBuildProperties.java to ! the build path :-(
as current (copy,delete,exclude) is complicated and not migration worthy,
used simple/straigt forward approach (using a yacyBuildProperties.java.template file as copy source)
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh
The build is then inside build/mac/YaCy.app
Right now this works so far but it does not have the correct release
number inside.
Target is to make this working for Windows releases and to embedd jre
entirely.
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.