Commit Graph

108 Commits (c37d718f16da30a567375936a90aaa939f5f91f6)

Author SHA1 Message Date
Michael Peter Christen 0b6566a389 optimizations when starting large crawl requests with many start urls in
12 years ago
Michael Peter Christen be27567b53 allow more links when starting a crawl by file
12 years ago
Michael Peter Christen 0fe7b6fd3b migrated the index export methods from the old metadata to solr. Now
12 years ago
Michael Peter Christen 4735bd47f4 - changed solr commit call and added an optimize option. Since Solr
12 years ago
Michael Peter Christen fb0fa9a102 - fixed 'delete from subpath' during crawl start which deleted nothing;
12 years ago
Michael Peter Christen eca68fa197 added debug code to crawler monitor
12 years ago
Michael Peter Christen 5fd3b93661 added deletion of hosts during crawl start if deleteold option was given
12 years ago
orbiter b55ea2197f - redesign of crawl start servlet
12 years ago
orbiter 1c66de4bd4 - removed scheduled crawling options in crawl start because it is
12 years ago
Michael Peter Christen 6244b084cd fixed wrong order of result count values
12 years ago
Michael Peter Christen 15d1460b40 added information about the reason of pausing of crawls
12 years ago
Michael Peter Christen 2371ef031c added solr faceted search support to YaCy search results
12 years ago
Michael Peter Christen 791e1dcfdf when a new crawl is started, delete all entries about error-urls for
12 years ago
Michael Peter Christen 5e77801aac update to web interface structure
12 years ago
orbiter 354ef8000d - added 'deleteold' option to crawler which causes that documents are
12 years ago
Michael Peter Christen f8f05ecba7 - added a delete button in host browser to delete a complete subpath
12 years ago
Michael Peter Christen ac9540dfb6 removed options for stopwords which are not used
12 years ago
Michael Peter Christen 85ca07b90e when a new crawl is started, an equal crawl, if still running, is
12 years ago
Michael Peter Christen ae6feb5610 showing the web structure graph as animation in the crawl monitor
12 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 53789555b9 fix for crawl start filter
12 years ago
Michael Peter Christen abebb3b124 added a crawl start checker which makes a simple analysis on the list of
12 years ago
orbiter ae246c30c3 fixed interpretation of directDocByURL attribute during crawl start
12 years ago
sixcooler c65b576a6f added filename for missing crawlname when crawling from file
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
orbiter 60b1e23f05 added new crawl options:
12 years ago
Michael Peter Christen 6ec02deec6 added new crawl attributes in crawl profile (not active yet)
12 years ago
Michael Peter Christen a13e5153ac - added the possibility to have not one but a list of crawl start urls
12 years ago
Michael Peter Christen 9644c186a4 added search functionality to ViewFile.html servlet
12 years ago
Michael Peter Christen b2b516cc3e added a collection attribute to crawls and searches:
12 years ago
Michael Peter Christen 0cab06c47c refactoring
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
Michael Peter Christen e3aa05b9dd added creation of subpath pattern when crawl start is 'from file'
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters
13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen 276a66a793 Adding a limit of 1000 links that a parser shall store during indexing.
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in
13 years ago
Michael Peter Christen d7eb18cdf2 accept also file names beginning with "file://" for crawl start from
13 years ago
Michael Peter Christen 16b21f7a5b Added more steering in Crawler_p.html interface
13 years ago
Michael Peter Christen 19efbf1b0f - apply directDocByURL to NOLOAD Queue
13 years ago
Michael Peter Christen ef5192f8c9 using the generic document parser for crawl starts instead of the html
13 years ago
Michael Peter Christen 992dbdf4bb added noload statistic to servlets
13 years ago