Commit Graph

65 Commits (76d218fbefece0cc55c9e3b4052d449dd6a5aee5)

Author SHA1 Message Date
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen 6ec02deec6 added new crawl attributes in crawl profile (not active yet)
12 years ago
Michael Peter Christen a13e5153ac - added the possibility to have not one but a list of crawl start urls
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen 16b21f7a5b Added more steering in Crawler_p.html interface
13 years ago
orbiter 3a807e10cf - added a cache for active crawl profiles to the crawl switchboard
13 years ago
orbiter b250e6466d implemented crawl restrictions for IP pattern and country lists
13 years ago
orbiter 5ad7f9612b added crawl settings for three new filters for each crawl:
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
low012 c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
14 years ago
low012 4fe1329de2 *) trying to at least fix symptoms of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3293#p22791
14 years ago
sixcooler 7fea51ecee check filter to bee a correct pattern on edit CrawlProfiles
14 years ago
orbiter 3ec94d87c4 show dom counter only for active crawls where the dom counter is enabled within the crawl profile
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
low012 e7552bd719 *) cleaning up the code a little bit
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter 377f001e0d sorting of crawl profile names in crawl profile editor, see
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 5a994c9796 added a scheduler based on API actions
14 years ago
low012 ad96a14d0a *) jump to Crawl Profile editor if a profile is selected to be edited
15 years ago
orbiter b7556893c6 removed terminate buttons for build-in crawl profiles in crawl profile editor
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 34354cf9b2 added a servlet that has been removed in SVN 4881; this servlet is now splitted and will be used for a simple crawl start and a remote crawl monitor (not yet integrated into the interface)
15 years ago
orbiter a06f7ddb33 more PMD recommendations
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter 04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
15 years ago
low012 5e4f267a36 *) added subversion properties and edited a few comments
15 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter bd5f4c78d8 - added default profile for surrogate indexing
16 years ago
orbiter 10f5ec1040 reverted last commit (more testing needed)
16 years ago
orbiter dba7ef5144 extended crawling constraints:
16 years ago
orbiter 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
16 years ago
lotus d9d9c522a1 addendum to last commit
16 years ago
orbiter 536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
16 years ago
danielr 3bb870bfcd added final where possible
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
17 years ago
orbiter dd75b3cabc - patch for bad profiles
17 years ago
orbiter 1689030ee8 refactoring: moved all crawler classes into their own package
17 years ago
f1ori b9602e891a * added CrawlProfileEditor_p.xml for monitoring in yacybar
17 years ago
orbiter d03940f2ec - included patch from http://forum.yacy-websuche.de/viewtopic.php?p=7193#p7193
17 years ago