Commit Graph

63 Commits (0ce17d823a028a8a2ce7360b17bb460e51d81823)

Author SHA1 Message Date
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 88773e4daa changed the default port from 8080 to 8090
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
14 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
15 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter 7535fd7447 - refactoring of CrawlEntry and CrawlStacker
16 years ago
orbiter dba7ef5144 extended crawling constraints:
16 years ago
orbiter 536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
16 years ago
danielr 17b7845eb5 * refactoring
17 years ago
danielr 3bb870bfcd added final where possible
17 years ago
orbiter c3d461d191 - removed superfluous copyright statement
17 years ago
orbiter 3ca98fee42 removed superfluous copyright statement
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
17 years ago
orbiter 1689030ee8 refactoring: moved all crawler classes into their own package
17 years ago
orbiter d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
17 years ago
orbiter d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
17 years ago
orbiter 541b817502 refactoring of switchboard queueing
17 years ago
orbiter 6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
17 years ago
orbiter a31b9097a4 preparations for mass remote crawls:
17 years ago
fuchsi 0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
17 years ago
orbiter 01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
17 years ago
orbiter 842308ea97 - redesigned crawl start menu, integrated monitoring pages
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 40b0547611 - documentaton changes (removed old forum links)
18 years ago
karlchenofhell 6fbe31425a - some code-cleanup (no more syntax-warnings here)
18 years ago
orbiter 61798f0ae6 added option to distinguish between text crawl and media crawl
18 years ago
orbiter 109ed0a0bb - cleaned up code; removed methods to write the old data structures
18 years ago