Commit Graph

75 Commits (caf9e98f09b144933c9f23840cebfc8b5739a931)

Author SHA1 Message Date
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
reger b65e2b527d include use of condenser's content text for language detection.
9 years ago
Michael Peter Christen 7829480b82 refactoring: separated condenser and tokenizer
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
Michael Peter Christen 1f5b5c0111 npe fix for latest scraper feature
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
reger 5790c7242e skip to tokenize punktuation as word in WordTokenizer
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
orbiter 08409ec680 no idea why the words max was an ordered one. This change increaes speed
10 years ago
reger 40133ba2d0 fix NPE in Condenser,
11 years ago
orbiter 61409788eb less word hash computations (removing some overhead because of MD5
11 years ago
Michael Peter Christen 9bb7eab389 hacks to prevent storage of data longer than necessary during search and
11 years ago
Michael Peter Christen 1b4fa2947d - fixed a problem which ocurred when a document was not recognized with
11 years ago
orbiter 6e8377b8ad do not check all words with synonym library if the library is empty
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 7ab5093321 added new solr title_exact_signature_l and
12 years ago
Michael Peter Christen addba047e2 changes in ranking computation
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen 72f165d58b added a Boost class which stores solr query boost values. The class can
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
12 years ago
orbiter 3190347814 added a synonyms_t field to solr and a process to read synonym files.
12 years ago
Michael Peter Christen 8219a445f3 refactoring
12 years ago
orbiter 63762d8f89 removed kelondro dependencies from cora
12 years ago
orbiter d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen 801972fe6f fix for url camel case parser and sentence reader
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen 94d54e2d91 added recognition of multi-word terms in vocabulary matching
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
Michael Peter Christen e0d8643226 - performance hacks
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen a58dc4a91f added autotagging to document condenser:
13 years ago
Michael Peter Christen 254adea51c small fixes
13 years ago
Al Sutton 8993cac4d8 Initial performance improvements
13 years ago
orbiter 0d858d48ec replaced String with StringBuilder in suggestion process
13 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter deda54d684 - relaxed matching of string-search (this is now case-insensitive)
14 years ago
orbiter 15e3a57b4e removed unused functions in condenser
14 years ago
orbiter c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
14 years ago