Commit Graph

195 Commits (bf8a6d984855f3aaca35b866a1e27d988e933f21)

Author SHA1 Message Date
reger 8fe28a83f2 harmonize used lastmodified date for rwi and fulltext in storeDocument
8 years ago
luccioman 7263d17436 Removed mentions of deprecated LURL-db.
8 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger d882991bc5 Implement sharing of ioDispatcher for term & citation index
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen 9d8f426890 adding a try-catch to link graph processing to prevent that a single
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago
Michael Peter Christen 3e6c3e2237 documents pushed over the api/push_p.html interface will have their
10 years ago
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 6a1865f507 refactoring date -> lastModified
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
Michael Peter Christen 70f03f7c8e do not cache search requests to Solr if the result is used for
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
Michael Peter Christen 0a879c98e7 added new 'firstSeen' database table and necessary data structures which
10 years ago
Michael Peter Christen 6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
11 years ago
Michael Peter Christen 81f9b34da7 increaesed ability ot search for all images on a single server within
11 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis
11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation:
11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
11 years ago
Michael Peter Christen 7640834b37 removed double concurrency to put Solr documents into the index. The
11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is
11 years ago
orbiter f6e441dd77 refactoring
11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the
11 years ago
Michael Peter Christen bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
11 years ago
Michael Peter Christen 0cabcbbe83 more efficient wordcount
11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago