Commit Graph

612 Commits (b4adbcbd35ce24aaa964393bdd5f3ba8a2b3cf25)

Author SHA1 Message Date
orbiter f204076d25 removed usage of temporary files: causes too much IO
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
15 years ago
orbiter 6c093d6aed - enhanced domain navigator computation
15 years ago
orbiter e0da0a84b0 performance fix in http parser
15 years ago
orbiter 11983bc936 redesigned some parts of the parser entry point:
15 years ago
orbiter 89b4fff1c2 adopted ant script for new exif library
15 years ago
orbiter 24e5faee75 added exif parsing for jpg images
15 years ago
orbiter 82f76e1296 removed log line
15 years ago
orbiter 0f8004f9da enhanced html parser to recognize a href tags inside header tags
15 years ago
orbiter 54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
15 years ago
lotus 85ca96227f fix for re-enable parser
15 years ago
lotus 38a3d55afd added more possible php extensions for html
15 years ago
orbiter 69c29acb6e no exception thread dump if parser cannot parse becuase that mime-type/extension is in the deny-set
15 years ago
orbiter 56e0d9bd01 - testings with image parser
15 years ago
orbiter 7d400b17d0 html parser support for .cfm files
15 years ago
orbiter f6731c6240 more logging etc.
15 years ago
orbiter 007f8297de added php3 as extension type for html
15 years ago
orbiter 5df628a2a4 - added BEncoder class
15 years ago
orbiter 82f57f79e5 more PMD enhancements
15 years ago
orbiter a06f7ddb33 more PMD recommendations
15 years ago
orbiter 66c0a8e849 more PMD recommendations
15 years ago
orbiter 2113fcd7e5 - fixed usage of isEmpty() which is not available in java 1.5
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter 3f771d2a16 fix for rss parser: be lazy when rss is not well-formed
15 years ago
orbiter dff4f95c78 some patches to get the torrent parser working
15 years ago
orbiter fbd24c2d84 integrated the torrent parser
15 years ago
orbiter bd32f8b8cb added a torrent metadata file parser
15 years ago
orbiter a37878b7d5 url parser regex performance hack
15 years ago
orbiter 8281e29963 - more configuration for profiling graph (number of events)
15 years ago
orbiter e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter 491ba6a1ba - some refactoring in workflow
15 years ago
orbiter 969123385b added json and rss output for image search
15 years ago
orbiter d183f8d980 refactoring (moved code from ContentTransformer to TemplateEngine)
15 years ago
orbiter 4df88a4e7a - fixes for missing or bad hashCode computation
15 years ago
orbiter dbdf2570ba added comparator and more fixes for SortStack/SortStore
15 years ago
orbiter d2938c44a1 - added bmp parser to the document parsers
15 years ago
orbiter 06d0dcde20 more enhancements to image search
15 years ago
orbiter 4c6312d103 enhanced image search
15 years ago
orbiter 2d8f3ee301 some performance hacks
15 years ago
orbiter 1a146b0d73 added a patch to ignore bad mime-ignore patterns
15 years ago
orbiter 29fe436e36 - fixed post-ranking including prefer mask
15 years ago
orbiter a97fdb4566 catch for NPE in image parser
15 years ago
orbiter cd6745b292 accept rss feeds without channel descriptions
15 years ago
orbiter 08f1cbb125 another update to the pdf parser
15 years ago
orbiter 605e896d6c more details for exception catching when parsing pdfs
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 19f31bb043 - moved OAI-PMH source list file from SETTINGS to DICTIONARIES/harvesting
15 years ago
orbiter 11f7da06ed - fixes to csv parser
15 years ago
orbiter 9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from
15 years ago
orbiter 176e334aa4 fixes
15 years ago
orbiter 2fa6bf440b workflow update to OAI-PMH importer
15 years ago
orbiter b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
15 years ago
orbiter 350d13e153 very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 30f108f97d added stub of oai-pmh importer (not working yet)
15 years ago
orbiter 52470d0de4 - fix for xls parser
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter a8ce192f63 - shifted main classes to new package net.yacy
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago