Commit Graph

416 Commits (4c242f9af9116a77361201c26d58ea08263c2885)

Author SHA1 Message Date
reger aa1a1f1d2c - small adjustment to make sure genericParser is tried last
12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen e6f361f474 adding the canonical tag to crawl queues
12 years ago
reger 83763ee4a4 jpeg parser: extract GPS location from meta data
12 years ago
Michael Peter Christen c4538d8d91 added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib
12 years ago
reger 3760e2616b bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
reger 8d1c4c423d make imageparser fileextension detection case insensitive (extensions are often upper case)
12 years ago
Michael Peter Christen 3e1e358fdc calling pdf cache flush on class initialization because calling of the
12 years ago
Michael Peter Christen 5344a1c5f7 getting the trash out
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
reger 97ab5b90e8 - odt & ooxml (office document) parser correction to add content to fulltext index
12 years ago
orbiter 7de5b9cfa0 fix for http://bugs.yacy.net/view.php?id=233
12 years ago
Michael Peter Christen 25499eead5 - added a new field for the regular expression in crawl start
12 years ago
Michael Peter Christen 50421171c3 added new schema fields:
12 years ago
Michael Peter Christen 7ab5093321 added new solr title_exact_signature_l and
12 years ago
orbiter 17ae51e741 increased number of links limitation from 1000 to 10000 for rss feeds
12 years ago
Michael Peter Christen addba047e2 changes in ranking computation
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 6a4878940b fix in html parser and bookmark generation
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
reger 168b1d130d Adding heuristic to get search results from configured systems which support opensearch specification
12 years ago
Michael Peter Christen 95712fdc8b update to pdf parser
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen 72f165d58b added a Boost class which stores solr query boost values. The class can
12 years ago
Michael Peter Christen b5ee88c6af added more logging to get info which url causes performance problems
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen f5ca5cea44 - added field options to all solr queries. This can be used to restrict
12 years ago
orbiter 5dfd6359cb redesign of the QueryParams class: introduced QueryGoal which holds the
12 years ago
Michael Peter Christen d88eb657fd Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen 6905182d41 - fix for number of words log message
12 years ago
Michael Peter Christen a33e2742cb - removed unnecessary synchronized and deadlock in crawler
12 years ago
reger 722a447b0d - optimize code of augmented parsing to enhence document tags
12 years ago
orbiter 276dd6452b removed warnings
12 years ago
Michael Peter Christen b991685782 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen b7ac1da6a3 gsa results shall have only one title in metadata and that should be the
12 years ago
reger 87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
orbiter 68d0f8de03 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
reger bfb0d4c69b - add language detection from <html lang="xx"> tag
12 years ago
Michael Peter Christen 7e3e45fd04 added Open Graph Metadata default fields, see http://ogp.me/ns#
12 years ago
Michael Peter Christen c3e5f667a7 added schema.org breadcrumb counter to parser and solr schema
12 years ago
Michael Peter Christen 4b5e0c1500 added an url rewriter which can be used to remove session ids from urls
12 years ago
Michael Peter Christen 584663ae8c - redesign of solr query construction
12 years ago
Michael Peter Christen 6ab64746d7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sof 5cb244b79b Merge remote branch 'origin/master'
13 years ago
apfelmaennchen 88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
13 years ago