Michael Peter Christen
c4082c4ff2
refactoring of ZIM reader, simplification, removed unnecessary code
1 year ago
Michael Peter Christen
c2b6b6e7b9
Fixed a large number of problems in the ZIM reader.
...
This library was not prepared for large data because it was missing long
data types for pointers. I had to modify the code-base in a fundamental
way:
- Proof-Reading,
- unclustering,
- refactoring,
- naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format ,
- change of Exception handling,
- extension to more attributes as defined in spec (bugfix for mime type
loading)
- bugfix to long parsing (prevented reading of large files)
The code is furthermore very inefficient and requires more attention.
However the format is very useful for YaCy as there are numerous data
sources for ZIM-Files.
1 year ago
Michael Peter Christen
5ba5fb5d23
upgraded pdfbox to 3.0.0
1 year ago
Michael Peter Christen
1fefae9baf
integrated the source code of a openzim file format reader. These are
...
the raw format reader files with no integration in YaCy yet, which will
maybe follow as a next step. The zim file format is documented in
https://openzim.org and the reader code was taken from the archived,
non-maintained repository at https://github.com/openzim/zimreader-java
1 year ago
Michael Peter Christen
4308aa5415
removed concept of empty passwords as "no passwords used",
...
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
1 year ago
Michael Peter Christen
2c60ff14bb
fixed default pw comparison
1 year ago
Michael Peter Christen
4da320bebf
added a warning message in ConfigBasic in case that the default password
...
was not changed.
1 year ago
Michael Peter Christen
7830268be1
fix 756c817b5a
...
must be applied to all code where a transaction token is generated.
1 year ago
Michael Peter Christen
756c817b5a
fix for https://github.com/yacy/yacy_search_server/issues/544
1 year ago
Michael Peter Christen
03bf259601
fix for https://github.com/yacy/yacy_search_server/issues/363
...
We still need to set the load in the process because a demand for higher
crawl speed may require to increase the maximum load limit. However,
following the criticism in the bug, we do never reduce the load limit
again.
1 year ago
mchristen
8fc51f66c6
fixed a test class which prevented compilation on latest jvm
1 year ago
Joel Strasser
53bafa1544
consistent formatting in string concatenation
1 year ago
Joel Strasser
22c4188001
additionally match release stub for YaCy version
1 year ago
Michael Peter Christen
ff8fe7b6a4
fix for ',' or '.' appearing within a word or number. This will not
...
tokenize the query into parts around that character to make it possible
to search for numbers or version numbers.
1 year ago
Michael Peter Christen
0689f4f0ae
Check if the character is a minus sign and is followed by a letter or a
...
digit. Treat it as part of the word/number.
1 year ago
Michael Peter Christen
5db97a8928
parser can now separate numbers from words also when they are not
...
separated by space, i.e. 4.7Ohm
1 year ago
Michael Peter Christen
e3797de7de
enhanced the word tokenizer to recognize numbers in a proper way
1 year ago
Michael Peter Christen
88cd17ea57
migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon.
1 year ago
Michael Peter Christen
0089f234f4
added npe protection
1 year ago
Michael Peter Christen
8285fe715a
tab to spaces for classes supporting the condenser.
...
This is a preparation step to make changes in condenser and parser more
visible; no functional changes so far.
1 year ago
Michael Peter Christen
195bd2e444
extended the maximum header size to 16k to prevent http error 431
1 year ago
Michael Peter Christen
92dad3ed49
removed 7Zip parser because the old library could not be replaced by a maven repository
1 year ago
Michael Peter Christen
5afcba162b
updated libraries
1 year ago
Michael Christen
a348146d8f
setting connect host to 0.0.0.0
1 year ago
Michael Peter Christen
1c0f50985c
fixed documentation and some details of handling of keywords
2 years ago
Michael Christen
3472bcb4d3
patched a 'java.lang.NoSuchMethodError: com.twelvemonkeys.imageio.util.IIOUtil.lookupProviderByName' problem which occurred only on ARM
2 years ago
Michael Christen
f7b6e98ed7
Merge pull request #562 from thkoch2001/fix-warnings
...
Fix warnings
2 years ago
Michael Peter Christen
a157d01bb5
increased network image size limit for linuxtage poster
2 years ago
Thomas Koch
6bca836f49
fix 3 javac warnings: redundant cast
...
see GitHub issue #561 for context
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigAccounts_p.java:85: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer)sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:156: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:167: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
2 years ago
Michael Christen
9012fe4519
extended error message
2 years ago
Michael Christen
74104ff2d3
fix to timeout
2 years ago
Michael Peter Christen
9fcd8f1bda
added canonical filter
...
attention: this is on by default!
(it should do the right thing)
2 years ago
Michael Peter Christen
5a52b01c09
front-end integration of tag valency
2 years ago
Michael Peter Christen
7f728bb4b4
crawl profile storage extension for tag valency
2 years ago
Michael Christen
4304e07e6f
crawl profile adoption to new tag valency attribute
2 years ago
Michael Peter Christen
5acd98f4da
introduction of tag-to-indexing relation TagValency
2 years ago
Michael Peter Christen
ab3ef87abf
fixed exec start command where a path contains spaces
2 years ago
Michael Peter Christen
17eec667fb
better release number representation
2 years ago
Michael Peter Christen
b1199e97f8
enabling new update location release.yacy.net
...
with new version numbers
2 years ago
Michael Peter Christen
66169d1aad
default build properties to remove barrier developing in IDE
...
environments
2 years ago
Michael Peter Christen
309adb814e
fixed import of jsonlist imort from searchlab.eu using a direct URL
2 years ago
Michael Peter Christen
5ddc794bb9
code cleanup in http clieant
2 years ago
Michael Peter Christen
62d177bf59
stub for jsonlist index importer web page
2 years ago
Michael Peter Christen
efa0425f00
refactoring: moved jsonlist importer to importer class
2 years ago
Michael Peter Christen
49daa32a88
yacy can now read searchlab export dump files
...
using the surrogate input process:
- copy the searchlab export file to DATA/SURROGATE/in
- the file is processed automatically and then moved to
DATA/SURROGATE/OUT
2 years ago
Michael Peter Christen
6042dd99c6
reduced danger that Tray does not initialize
2 years ago
Michael Christen
61b27217b9
throttle number of DNS requests:
...
as soon as the number of requests is > 50, there is a forced delay
of (10 * (requests - 50)) milliseconds. That means that once the number
of DNS requests reach 150, there is a one second delay to each request.
This shall prevent that a remote DNS is flooded with request and
possibly gets damaged.
This is also a fix/enhancement for
https://github.com/yacy/yacy_search_server/issues/513
2 years ago
Michael Christen
99174282d8
try to shut down in a bit more ordered way
...
inspired by https://github.com/yacy/yacy_search_server/issues/518
2 years ago
Michael Peter Christen
482f507e65
upgraded solr from 8.8.1 to 8.9.0
...
should hopefully fix
https://github.com/yacy/yacy_search_server/issues/496
because it includes https://issues.apache.org/jira/browse/SOLR-13034
2 years ago
Michael Peter Christen
d49f937b98
added iso,apk,dmg to extension-deny list
...
see also https://github.com/yacy/yacy_search_server/issues/510
zip is not on the list because it can be parsed
2 years ago