Michael Peter Christen
848e9304d9
evil bots may crawl harder
6 years ago
luccioman
8e72863a7f
Merge pull request #250 from theel0ja/patch-1
...
Improved formatting of markdown
6 years ago
luccioman
a997133260
Fixed gzip decompression regression on index transfer APIs
...
Processing of gzip encoded incoming requests (on /yacy/transferRWI.html
and /yacy/transferURL.html) was no more working since upgrade to Jetty
9.4.12 (see commit 51f4be1
).
To prevent any conflicting behavior with Jetty internals, use now the
GzipHandler provided by Jetty to decompress incoming gzip encoded
requests rather than the previously used custom GZIPRequestWrapper.
Fixes issue #249
6 years ago
luccioman
e85f231bdf
Fixed termination of Host browser and link structure Solr query threads
...
On some conditions (especially when reaching timeout), concurrent Solr
query tasks used by the /HostBrowser.html and /api/linkstructure.json
never terminated, thus leaking resources, as reported by @Vort in issue
#246
6 years ago
Elias Ojala
4a126881bb
Improved formatting of markdown
6 years ago
luccioman
260ac11c65
Limit length of initially visible text in link structure graph nodes
...
To improve a bit readability of graphs having a large number of nodes.
6 years ago
luccioman
5a8d9abd8a
Upgraded d3js dependency from 3.4.4 to 5.7.0
6 years ago
luccioman
9f8e1994a4
Added missing CSS width units to some HostBrowser.html styling
6 years ago
luccioman
0b1d2cb0dd
Fixed "TypeError: table.tBodies[0] is undefined" host browser JS error
...
Traced in browser console when a host details table is empty.
6 years ago
luccioman
fcf6b16db4
Added new crawler attribute for finer control over Media Type detection
...
New "Media Type detection" section in the advanced crawl start page
allow to choose between :
- not loading URLs with unknown or unsupported file extension without
checking the actual Media Type (relying Content-Type header for now).
This was the old default behavior, faster, but not really accurate.
- always cross check URL file extension against the actual Media Type.
This lets properly parse URLs ending with an apparently odd file
extension, but which have actually a supported Media Type such as
text/html.
Sample URLs with misleading file extensions added as documentation in
the crawl start page.
fixes issue #244
6 years ago
luccioman
88d0ed676c
Render http status instead of null responses on snapshot api errors
6 years ago
luccioman
a83a56473e
Added suport for PDF snapshots generation when running on MS Windows
6 years ago
luccioman
18d07538ad
Upgraded Apache Ant from 1.10.1 to 1.10.5 in Docker alpine image flavor
6 years ago
luccioman
053df1f312
Added support for snapshots generation to Docker images
6 years ago
luccioman
92e10d7d1c
Added a crawl start hint message on availability or not of wkhtmltopdf
...
As this tool is required to produce pdf snapshots
6 years ago
luccioman
8852c97cee
Added basic styling for cleaner rendering of missing image snapshots
...
For the output of the Solr snapshots writer
6 years ago
luccioman
746e0e788d
Render a relevant HTTP status code on snapshot image rendering error
...
Instead of a null response body which is not very helpful.
6 years ago
luccioman
50b6edfcf5
Updated Solr snapshots writer for a cleaner html head
6 years ago
luccioman
f366f43d6b
Made snapshots size customizable in Solr snapshots response writer
6 years ago
luccioman
7a62fc0e66
Fixed concurrency issue in custom classloader used for template classes
...
As reported in issue #241 , the problem is only critical (random but
complete crash of the JVM) when upgrading to JDK11.
6 years ago
luccioman
0eb52f8c72
Added documentation hint about JVM option useful to debug JVM crashes
6 years ago
luccioman
753bda1409
Fixed remaining blacklist entries improper decoding of '+' character
...
In the blacklist cleaner and import/export administration pages.
6 years ago
luccioman
61c337f29a
Decode blacklist entries for easier edition of non ascii chars
...
Not using the JDK URLDecoder.decode() function, as it strips '+'
characters when they occur after '?' (both characters having regular
expression semantics when used in blacklist path patterns)
6 years ago
luccioman
ed93221fa1
Improved normalization of blacklist path patterns having non ascii chars
...
Normalize blacklist path patterns using percent-encoding, at pattern
edition in web interface and at loading from configuration files.
Fixes issue #237
6 years ago
luccioman
d42f079c2d
Additional modifications for typo fix in Bookmarks.html from PR #240
6 years ago
luccioman
d23578efc3
Merge pull request #240 from ivanhercaz/fixEnglishBookmarksPage
...
Fix English Bookmarks.html
6 years ago
ivanhercaz
8a8208c7e2
typo fix
6 years ago
ivanhercaz
a651358cce
cleaning the file of entries in German already translated to Spanish
6 years ago
ivanhercaz
dc09f240e7
changin all «» to "" to avoid confusions
6 years ago
ivanhercaz
07dae68ab0
ConfigHeuristics_p.html translated
6 years ago
ivanhercaz
102c1cc4a9
ConfigHTCache_p.html translated
6 years ago
ivanhercaz
41684ba559
adding Spanish to the interface language list
6 years ago
ivanhercaz
1714805092
ConfigAccounts_p.html translated
6 years ago
ivanhercaz
91ac9c652a
Collage.html translated
6 years ago
ivanhercaz
2d393e8f07
Bookmarks.html translated
6 years ago
ivanhercaz
1dafc85d33
typo fix in Bookmarks.html
6 years ago
ivanhercaz
275cff0cb7
removing duplicated entry (the one in German) for Translator_p.html
6 years ago
ivanhercaz
39fb80e84a
BlogComments.html translated
6 years ago
ivanhercaz
7f5121a0ec
Translator_p.html translated
6 years ago
ivanhercaz
59ea245e8b
Blog.html translated
6 years ago
ivanhercaz
d221ddcbc8
Blacklist_p.html translated
6 years ago
ivanhercaz
843f0bb48f
BlacklistTest_p.html translated and forgotten string in BlacklistImpExp_p.html
6 years ago
ivanhercaz
729a09d45d
BlacklistImpExp_p.html translated
6 years ago
ivanhercaz
c45324f086
BlacklistCleaner_p.html translated
6 years ago
ivanhercaz
1be4c84ed7
Autocrawl_p.html translated
6 years ago
ivanhercaz
c0f7aa92e4
AccessTracker_p.html translated
6 years ago
ivanhercaz
7aa7ba689c
AccessGrid_p.html translated
6 years ago
luccioman
3d14fb51c5
Removed now unused Java import in addition to modification from PR #239
6 years ago
luccioman
d5ec706604
Merge pull request #239 from otteresk/master
...
Display correct time in Rejected URLs overview
6 years ago
otter
8820d8d7c7
replace current date by FailDate
6 years ago