Site Crawler:
Download all web pages from a given domain or base URL.
Hints
Crawl Speed Limitation
No more that two pages are loaded from the same host in one second (not more that 120 document per minute) to limit the load on the target server.
Target Balancer
A second crawl for a different host increases the throughput to a maximum of 240 documents per minute since the crawler balances the load over all hosts.
High Speed Crawling
A 'shallow crawl' which is not limited to a single host (or site)
can extend the pages per minute (ppm) rate to unlimited documents per minute when the number of target hosts is high.
This can be done using the Expert Crawl Start servlet.
Scheduler Steering
The scheduler on crawls can be changed or removed using the API Steering.