pr0vieh
35620762ac
bring defaults for recrawlindex to init config
11 months ago
Michael Christen
d097a642c2
Merge pull request #615 from okybaca/logging2
...
Logging unclutter
12 months ago
Michael Christen
6d5e9ff53f
Merge pull request #616 from okybaca/logging3
...
changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine
12 months ago
Michael Christen
d5d4e8fe3a
Merge pull request #617 from pr0vieh/master
...
Add setting for DHT receive loadprereq insted of hardcoded load < 2.0
12 months ago
pr0vieh
dfb2b79609
Add setting for DHT receive loadprereq insted of hardcoded load < 2.0
12 months ago
okybaca
5dee8dbcbd
changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine
12 months ago
Michael Christen
4c603e23f0
Merge pull request #610 from okybaca/cr-text
...
UI: added a more descriptive message, CitationRank instead of cr
12 months ago
Michael Christen
040cd8be6d
Merge pull request #612 from okybaca/sitemap-fix
...
updated apache libs
12 months ago
Michael Christen
0233ecd481
Merge pull request #614 from okybaca/logging
...
added some logging prefixes to yacy.logging
12 months ago
okybaca
7831f294a9
changed regular peerping messages to level fine
12 months ago
okybaca
553c859703
logging: moved some log-cluttering DHT messages to level 'fine'
12 months ago
okybaca
1c5fca9a58
changed network operation log category from YACY to NETWORK
12 months ago
okybaca
2f44fc0257
added some logging prefixes to yacy.logging
12 months ago
Michael Peter Christen
3d3bdb0f5f
added zim importer rule for mdwiki
1 year ago
Michael Peter Christen
4a611ac6a3
another possible fix for
...
https://github.com/yacy/yacy_search_server/issues/500
1 year ago
okybaca
9c59c6814b
updated apache libs
1 year ago
sgaebel
d72cd7916c
Merge branch 'master' of https://github.com/yacy/yacy_search_server
1 year ago
sgaebel
0663ae3c99
adds synchornized dumplog
1 year ago
okybaca
cba84632ee
UI: added a more descriptive message, CitationRank instead of cr
1 year ago
Michael Peter Christen
cff0991d85
test if this is helpful for https://github.com/yacy/yacy_search_server/issues/500
1 year ago
Michael Peter Christen
ceb07a5218
fixed problem with zim importer which crashed when non-valid urls appeared
1 year ago
Michael Peter Christen
656b3e3e77
updated guava to latest and added missing library for failureaccess
1 year ago
Michael Peter Christen
3268a93019
added a 'minified' option to YaCy dumps
1 year ago
Michael Peter Christen
c20c4b8a21
modified export: added maximum number of docs per chunk
...
The export file can now be many files, called chunks.
By default still only one chunk is exported.
This function is required in case that the exported files shall be
imported to an elasticsearch/opensearch index. The bulk import function
of elasticsearch/opensearch is limited to 100MB. To make it possible to
import YaCy files, those must be splitted into chunks. Right now we
cannot estimate the chunk size as bytes, only as number of documents.
The user must do experiments to find out the optimum chunk max size,
like 50000 docs per chunk. Try this as first attempt.
1 year ago
Michael Peter Christen
655d8db802
detailed directions in index export to explain how the export can be
...
imported again using elasticsearch/opensearch
1 year ago
Michael Peter Christen
24011dcbcc
more file name extensions for json list surrogate files
1 year ago
Michael Peter Christen
34a9fc1a07
bugfixes to zim reader:
1 year ago
Michael Peter Christen
7db0534d8a
Added a zim parser to the surrogate import option.
...
You can now import zim files into YaCy by simply moving them
to the DATA/SURROGATE/IN folder. They will be fetched and after
parsing moved to DATA/SURROGATE/OUT.
There are exceptions where the parser is not able to identify the
original URL of the documents in the zim file. In that case the file
is simply ignored.
This commit also carries an important fix to the pdf parser and an
increase of the maximum parsing speed to 60000 PPM which should make it
possible to index up to 1000 files in one second.
1 year ago
Michael Peter Christen
70e29937ef
added a check in zim importer which tests if import URLs actually exist
1 year ago
Michael Peter Christen
496f768c44
modified cache strategy for zim clusters
1 year ago
Michael Peter Christen
fdc6311dc7
added parsing rules for wikibooks and wikinews in zim reader
1 year ago
Michael Peter Christen
2ea54b3503
fixed blob iterator in zim cluster definition
1 year ago
Michael Peter Christen
54fa5d3c2e
added a cluster cache but it requires more testing
1 year ago
Michael Peter Christen
53b01dbf2e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
1 year ago
Michael Peter Christen
41856e9f34
added an optimized zim file entry iterator
1 year ago
Michael Peter Christen
1c0df28bfb
added a zim importer that can be used for surrogate imports.
...
Can not be used yet because it requires some security additions
to verify that the given urls actually work.
1 year ago
Michael Peter Christen
b9912ff50d
repaired dockerfiles for aarch64 and armv7
1 year ago
Michael Peter Christen
33b6878ded
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
1 year ago
Michael Christen
68554cea07
Merge pull request #605 from okybaca/readme-docker-link
...
added a link to docker build guide
1 year ago
Michael Christen
06bfd5802f
Merge pull request #603 from okybaca/dark-green-css
...
fine tuned the dark-green color scheme
1 year ago
Michael Christen
43d5cd101e
Merge pull request #607 from okybaca/wikilinks
...
replaced all the links to legacy legacy wiki to legacy wiki
1 year ago
okybaca
4add1f6bc7
replaced all the links to legacy legacy wiki to legacy wiki
1 year ago
Michael Peter Christen
e2c86a8eba
added a ZIM cluster pointer cache
1 year ago
Michael Peter Christen
4a54b24703
fix for "negative seek offset" error during extension of heap files.
...
This would have always happend when a heap file exceeds 2GB.
should fix https://github.com/yacy/yacy_search_server/issues/372
1 year ago
okybaca
69db75ce45
added a link to docker build guide
1 year ago
Michael Peter Christen
9c8fb97985
introduced url list and title list caching and enhanced input stream
...
performance in ZIM reader
1 year ago
Michael Peter Christen
b0ae660790
added Zstandard compressed data decompression for ZIM files type 5
...
also: more generalization and performance enhancements
1 year ago
Michael Peter Christen
ad8ee3a0b6
fixed typo in class name
1 year ago
Michael Peter Christen
c4082c4ff2
refactoring of ZIM reader, simplification, removed unnecessary code
1 year ago
Michael Peter Christen
c2b6b6e7b9
Fixed a large number of problems in the ZIM reader.
...
This library was not prepared for large data because it was missing long
data types for pointers. I had to modify the code-base in a fundamental
way:
- Proof-Reading,
- unclustering,
- refactoring,
- naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format ,
- change of Exception handling,
- extension to more attributes as defined in spec (bugfix for mime type
loading)
- bugfix to long parsing (prevented reading of large files)
The code is furthermore very inefficient and requires more attention.
However the format is very useful for YaCy as there are numerous data
sources for ZIM-Files.
1 year ago