Commit Graph

69 Commits (e1edb236898e2e0a6e9c36556801e6e3ed2b12b2)

Author SHA1 Message Date
orbiter 61798f0ae6 added option to distinguish between text crawl and media crawl
18 years ago
orbiter c500178fd7 redesign of index creation interface
18 years ago
orbiter 109ed0a0bb - cleaned up code; removed methods to write the old data structures
18 years ago
orbiter 30888e7a2f implementation of search constraints
18 years ago
orbiter 497428c8ec refactoring
18 years ago
orbiter 76fceb9997 refactoring
18 years ago
orbiter bb7d4b5d5e refactoring to prepare new RWI entry object
18 years ago
orbiter 918b59dc5e - bugfix for snippet profile (no delete button)
18 years ago
orbiter 3ad0709b53 added a delete button to crawl profile list.
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
theli 5847492537 *) next step of restructuring for new crawlers
18 years ago
theli 34831d2d9f *) Check validity of crawl filter reg.exp. before adding it into the crawler queue
19 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 5f72be2a95 some redesign of EURL storage
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter 90d569d70f refactoring of index management:
19 years ago
orbiter 00a5d435e2 - fixed some bugs with domain filter
19 years ago
orbiter bd283b8443 fixed bugs:
19 years ago
orbiter e566d1d8d6 some bugfixes regarding new crawling options
19 years ago
orbiter c7f1300300 -fixes for last commit
19 years ago
orbiter 860a7b545b enhanced input options for crawl start
19 years ago
orbiter 7a650d0023 several bugfixes
19 years ago
orbiter 59d52fb4a9 fixed some problems with crawl profiles
19 years ago
orbiter 0c9b61820e enhanced re-crawl settings
19 years ago
orbiter 708cc6c8d9 fixed some bugs for auto-filter and added monitor in profile list
19 years ago
orbiter 63f39ac7b5 added 3 new crawling steering options:
19 years ago
orbiter 1fc3b34be6 some pre-work (without function yet) to implement:
19 years ago
theli 2336f0f013 *) allow pausing/resuming of crawlJob Threads separately
19 years ago
orbiter 37f88b4017 code cleanup
19 years ago
orbiter 548f0c6aff first Try with Eclipse / cleaned sources
19 years ago
theli 444a5a9368 *) Bugfix for Entries with null url in GlobalQueue
19 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
19 years ago
hydrox cb69047b91 *)cleanup access static methods and fields
19 years ago
hydrox 56b9f34411 *)removed unused imports
19 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
19 years ago
low012 4dbc871524 *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it.
19 years ago
theli e6338b4390 *) Bugfix for "Error with request: GET http://localpeer:80/IndexDelete_p.ht"
19 years ago
theli bead8a32aa *) IndexCreate_p.java:
19 years ago
theli 330eae7cf3 *) Normalizing CrawlerStartURL now before crawling is started
20 years ago
orbiter bb3e897baf mor minor changes
20 years ago
orbiter 2d8557cb10 minor changes
20 years ago
orbiter e84a177c49 many bigfixes
20 years ago
orbiter 9ee8a5ba6c fixed big in yacynews
20 years ago
orbiter d34eb23e4e fixed news; added news appearance on Network and IndexCreate page; added intention string to global crawl
20 years ago
orbiter 1022fbeb65 many YaCyNews fixes
20 years ago
orbiter 13abd8b6e7 added news-creation at crawl start
20 years ago
orbiter 81e564edb8 faster crawl profile list cleanup
20 years ago
orbiter 3470a72d48 fixed div by zero, set default delays, fixed release number format and display
20 years ago
orbiter be1f324fca performance setting for remote indexing configuration and latest changes for 0.39
20 years ago
theli 5c3822d5f4 *) adding experimental support for parsing of bookmarksfiles
20 years ago