Commit Graph

73 Commits (526f2d6a8ba5a1c36b489461d7a2c59098dfcffc)

Author SHA1 Message Date
orbiter a563b05b60 enhanced crawler:
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
orbiter 9d080f387e change in handling of the all-visible home path for storage in YaCy:
14 years ago
orbiter e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
14 years ago
orbiter 933dc1a600 removed old rss parser (will be replaced with parser from cora package)
14 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 5ab5ac80fe fix for NPE in TextParser
15 years ago
orbiter 3247f0e901 fix for deadlocks caused by self-blocking access to TreeMap in concurrent environments. The TreeMap was replaced by a ConcurrentHashMap and additional care that the strings are compared all in lowercase
15 years ago
orbiter 11983bc936 redesigned some parts of the parser entry point:
15 years ago
orbiter 24e5faee75 added exif parsing for jpg images
15 years ago
lotus 85ca96227f fix for re-enable parser
15 years ago
orbiter 69c29acb6e no exception thread dump if parser cannot parse becuase that mime-type/extension is in the deny-set
15 years ago
orbiter 56e0d9bd01 - testings with image parser
15 years ago
orbiter fbd24c2d84 integrated the torrent parser
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter d2938c44a1 - added bmp parser to the document parsers
15 years ago
orbiter 1a146b0d73 added a patch to ignore bad mime-ignore patterns
15 years ago
orbiter 9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from
15 years ago
orbiter 52470d0de4 - fix for xls parser
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago