Commit Graph

28 Commits (c5df34989eef3c81c58016a473d5319d2d4814f1)

Author SHA1 Message Date
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data: 10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a 10 years ago
Michael Peter Christen de3e373913 using precompiled CommonPattern.TAB for split 10 years ago
Michael Peter Christen 1f5047b15f using precompiled pattern CommonPattern.SEMICOLON for splits 10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where 10 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables 12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not 12 years ago
Michael Peter Christen 35ab2cef7b added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in 12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler 12 years ago
Michael Peter Christen 528d6763fa - added new solr fields: 13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty() 13 years ago
Michael Peter Christen b0c408788b made class methods static where possible 13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters 13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one 13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there 13 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content 14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser 14 years ago
low012 2a6499364d *) minor changes 14 years ago
low012 c0274bd123 *) minor changes 14 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds! 15 years ago
orbiter b6fb239e74 redesign of parser interface: 15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs 15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data 15 years ago
orbiter 54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request 15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration. 16 years ago
orbiter 11f7da06ed - fixes to csv parser 16 years ago
orbiter 9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from 16 years ago