Commit Graph

74 Commits (37df2e19fd8075fe9746ac7ca4538f7a0e754472)

Author SHA1 Message Date
Michael Peter Christen 47682bf467 fix for unresolved pattern
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
Michael Peter Christen 82bfd9e00a - crawl profiles shall be deleted from active and passive stacks if they
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen f1c5338210 prepartion for greedy crawl profiles and refactoring
12 years ago
Michael Peter Christen 25499eead5 - added a new field for the regular expression in crawl start
12 years ago
Michael Peter Christen 0716a24737 added more / all new crawl profile fields into crawl profile editor
12 years ago
Michael Peter Christen 4a14122ba7 in case that a crawl profile has a collection assigned, use the
12 years ago
Michael Peter Christen c25d7bcb80 - added concurrency for robots.txt loading
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen 6ec02deec6 added new crawl attributes in crawl profile (not active yet)
12 years ago
Michael Peter Christen a13e5153ac - added the possibility to have not one but a list of crawl start urls
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen 16b21f7a5b Added more steering in Crawler_p.html interface
13 years ago
orbiter 3a807e10cf - added a cache for active crawl profiles to the crawl switchboard
13 years ago
orbiter b250e6466d implemented crawl restrictions for IP pattern and country lists
13 years ago
orbiter 5ad7f9612b added crawl settings for three new filters for each crawl:
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
low012 4fe1329de2 *) trying to at least fix symptoms of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3293#p22791
14 years ago
sixcooler 7fea51ecee check filter to bee a correct pattern on edit CrawlProfiles
14 years ago
orbiter 3ec94d87c4 show dom counter only for active crawls where the dom counter is enabled within the crawl profile
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 e7552bd719 *) cleaning up the code a little bit
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter 377f001e0d sorting of crawl profile names in crawl profile editor, see
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 5a994c9796 added a scheduler based on API actions
14 years ago
low012 ad96a14d0a *) jump to Crawl Profile editor if a profile is selected to be edited
15 years ago
orbiter b7556893c6 removed terminate buttons for build-in crawl profiles in crawl profile editor
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 34354cf9b2 added a servlet that has been removed in SVN 4881; this servlet is now splitted and will be used for a simple crawl start and a remote crawl monitor (not yet integrated into the interface)
15 years ago
orbiter a06f7ddb33 more PMD recommendations
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter 04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
15 years ago
low012 5e4f267a36 *) added subversion properties and edited a few comments
15 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter bd5f4c78d8 - added default profile for surrogate indexing
16 years ago
orbiter 10f5ec1040 reverted last commit (more testing needed)
16 years ago
orbiter dba7ef5144 extended crawling constraints:
16 years ago
orbiter 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
16 years ago