Commit Graph

259 Commits (480e4a6a5c86d11c0eacba6fb5f19a0772735bfd)

Author SHA1 Message Date
reger 72f6a0b0b2 enhance recrawl job
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
reger cd7c0e0aae detail optimization of RecrawlThread
10 years ago
reger ace71a8877 Initial (experimental) implementation of index update/re-crawl job
10 years ago
reger 141cd80456 correct log msg text
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Ryszard Goń ca1a70aec8 fix for Accept '?' URLs column in Crawl Profile List
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen 3288489fd2 more logging during start-up
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
Michael Peter Christen bee5ee7cce removed some warnings
10 years ago
Michael Peter Christen 783cf6fbc7 the LinkedBlockingQueue is much faster than the ArrayBlockingQueue
10 years ago
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
Michael Peter Christen 3e6c3e2237 documents pushed over the api/push_p.html interface will have their
10 years ago
Michael Peter Christen 8c3e5b7b6d added experimental pdf splitting which enables YaCy to split pdfs during
10 years ago
Michael Peter Christen 28683530cd fixes to usage of no-cache: use and recognize also the no-store
10 years ago
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 2362ad7c34 fix for a count issue in snapshot api
10 years ago
Michael Peter Christen 9971e197e0 Added a transaction interface to the snapshots: all documents in the
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen ab6cc3c88c added concurrent generation of snapshot pdfs
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
Michael Peter Christen 4fe4bf29ad added rss feed output to snapshot servlet which can be used to get a
10 years ago
reger 568c991405 remove the unused Request variable
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen 226aea5914 added a servlet which can create preview images, preview tumbnails and
10 years ago
Michael Peter Christen e586e423aa in case that loading from the cache fails, load from wkhtmltopdf without
10 years ago
Michael Peter Christen 25a64c51b3 moved snapshot generation out of the html handler to prevent that
10 years ago
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
10 years ago
Michael Peter Christen ad0da5f246 added new web page snapshot infrastructure which will lead to the
10 years ago
Michael Peter Christen 84763126e0 added option to make the YaCy proxy act as the cache is never stale. If
10 years ago
Michael Peter Christen a39419f2ef more stacks shall be considered for on-demand loading, not only
10 years ago
Michael Peter Christen 5bb52f79be reduce number of calls to queue.size() because that may be a bottleneck
10 years ago
Michael Peter Christen a34f837592 better delete all files in path when removing host crawl stack
10 years ago
Michael Peter Christen 10b1db430a if we have many hosts, use on-demand earlier
10 years ago
Michael Peter Christen 6983dff334 explain crawl denial when not switched to intranet mode
10 years ago
Michael Peter Christen d8beafba3a fix for values in CrawlProfileEditor table and xml; now the full profile
10 years ago
Michael Peter Christen ec95dfa2e6 fixed crawl profile xml result which did not show the correct crawl
10 years ago
Michael Peter Christen 9b1958e8ca more ipv6 bugfixes
10 years ago
Michael Peter Christen e1bc768f9d more IPv6 bugfixes
10 years ago
reger fb1fcc2b03 handle noarchive tag, skip writing page to cache
10 years ago
Michael Peter Christen 6491270b3a large IPv6 redesign of peer ping methods!
10 years ago
Michael Peter Christen 67cd4c37bd activated the new apk parser which was already ready but not included in
10 years ago
Michael Peter Christen 025516f682 fix for crawl limit for number of pages fail
11 years ago
orbiter 3ac31614a3 added option to reverse-sort YaCy tables (internal API change only)
11 years ago
Michael Peter Christen bf18a39d0e replaced warning with info
11 years ago
Michael Peter Christen ebd0be2cea fixes and speed updates for search process
11 years ago