Commit Graph

72 Commits (2a87b08cea67f8f2ae46e318c1c3945e8520ec53)

Author SHA1 Message Date
luccioman a04feac064 Ensure file input streams proper closing in both success and failures
8 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
luc f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
9 years ago
reger c6687dd560 fix a system.out to log.fine
9 years ago
reger 78e8c6f3e5 refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES
9 years ago
reger 851e8f6c8a check jpeg file signature in genericImageParser
9 years ago
reger d5330391de remove some unused var allocation in parser
9 years ago
reger 7c82cd4415 add a end condition to svgParser for wrong content
9 years ago
reger c647d899e3 add svgParser to parse metadate from svg images
9 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
reger 5ca0762179 fix: eom on parsing ico file by genericImageParser
10 years ago
Michael Peter Christen 8b5d074715 fix for image parser (there is a class missing!)
10 years ago
reger 9edc7308aa update to metadata-extractor-2.7.0.jar
10 years ago
reger b0c87d8240 fix image search expand box, cut-off of 2nd capture line height
10 years ago
reger eaccce3467 added metadataImageParser for tif and psd (Photoshop) images.
10 years ago
reger a69f5358ff use javax ImageIO getReader to add supported image extension/mime
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
Michael Peter Christen 67beef657f strong redesign of html parser: object recursion is now made using a
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 35ab2cef7b added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in
11 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
reger 83763ee4a4 jpeg parser: extract GPS location from meta data
12 years ago
Michael Peter Christen c4538d8d91 added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib
12 years ago
reger 3760e2616b bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
reger 8d1c4c423d make imageparser fileextension detection case insensitive (extensions are often upper case)
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
cominch 9cbfc1a1c0 augmentedProxy, which forwards every proxy request to a
13 years ago
orbiter 804e48888b smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
13 years ago
orbiter 0819e1d397 protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
13 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
low012 9f38c0023d *) Minor changes, mainly cleaning up a little bit, no functional changes.
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter 7bdb13bf7f more fixes to smb crawling: better file names
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
orbiter fb92f9ae8e added mime type image/jpeg (image/jpg is wrong but it is left here because it does not harm and this error also exists in configuration of web servers)
14 years ago