Commit Graph

448 Commits (b9d36e45e083c6ad235ca4b0f842aaab4407d6a6)

Author SHA1 Message Date
Michael Peter Christen addba047e2 changes in ranking computation
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 6a4878940b fix in html parser and bookmark generation
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
reger 168b1d130d Adding heuristic to get search results from configured systems which support opensearch specification
12 years ago
Michael Peter Christen 95712fdc8b update to pdf parser
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen 72f165d58b added a Boost class which stores solr query boost values. The class can
12 years ago
Michael Peter Christen b5ee88c6af added more logging to get info which url causes performance problems
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen f5ca5cea44 - added field options to all solr queries. This can be used to restrict
12 years ago
orbiter 5dfd6359cb redesign of the QueryParams class: introduced QueryGoal which holds the
12 years ago
Michael Peter Christen d88eb657fd Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen 6905182d41 - fix for number of words log message
12 years ago
Michael Peter Christen a33e2742cb - removed unnecessary synchronized and deadlock in crawler
12 years ago
reger 722a447b0d - optimize code of augmented parsing to enhence document tags
12 years ago
orbiter 276dd6452b removed warnings
12 years ago
Michael Peter Christen b991685782 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen b7ac1da6a3 gsa results shall have only one title in metadata and that should be the
12 years ago
reger 87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
orbiter 68d0f8de03 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
reger bfb0d4c69b - add language detection from <html lang="xx"> tag
12 years ago
Michael Peter Christen 7e3e45fd04 added Open Graph Metadata default fields, see http://ogp.me/ns#
12 years ago
Michael Peter Christen c3e5f667a7 added schema.org breadcrumb counter to parser and solr schema
12 years ago
Michael Peter Christen 4b5e0c1500 added an url rewriter which can be used to remove session ids from urls
12 years ago
Michael Peter Christen 584663ae8c - redesign of solr query construction
12 years ago
Michael Peter Christen 6ab64746d7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sof 5cb244b79b Merge remote branch 'origin/master'
12 years ago
apfelmaennchen 88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
12 years ago
Michael Peter Christen 31485a963d refactoring
12 years ago
Michael Peter Christen 3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
12 years ago
Michael Peter Christen 3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter 3190347814 added a synonyms_t field to solr and a process to read synonym files.
12 years ago
Michael Peter Christen 411d0e839b added an underline text field to solr to record all underlined texts
12 years ago
Michael Peter Christen 24d2ee3c52 - better date ranking
12 years ago
sixcooler 6c50d016ed pdf- and zipParser should not use forced Memory-Limits
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 8219a445f3 refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
orbiter 63762d8f89 removed kelondro dependencies from cora
12 years ago
Michael Peter Christen e54ac38095 - some corrections in usage of getFile() and getFileName()
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
12 years ago
Michael Peter Christen e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
12 years ago
orbiter 67f2866cd0 small fixes
12 years ago
orbiter 67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
orbiter 482afed07c reduced logging overhead (a bit)
13 years ago
orbiter bbfa497a3c replaced more size() > 0 by !isEmpty()
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen 801972fe6f fix for url camel case parser and sentence reader
13 years ago
Michael Peter Christen fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within
13 years ago
Michael Peter Christen 92731e5287 fix for sevenzip parser
13 years ago
Michael Peter Christen 8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch
13 years ago
Michael Peter Christen b1e7c11fba fix for pattern matcher in html parser
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters
13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
orbiter fc0f9543fe More SentenceReader cleanup
13 years ago
orbiter 586bb0eb6a Simplified SentenceReader (no more Reader inside..)
13 years ago
orbiter 7f851d62a7 replaced HashARC with SizeLimited Objects which are less costly
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
orbiter bb8dcb4911 automatically adopt size of word cache to available memory
13 years ago
Michael Peter Christen ad09b786bf clean up parser data
13 years ago
Michael Peter Christen 276a66a793 Adding a limit of 1000 links that a parser shall store during indexing.
13 years ago
Michael Peter Christen de903a53a0 parser refactoring & hacks
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen 0c345d1559 giving threads name so its easier to see whats happening during
13 years ago
Michael Peter Christen 508a81b86c added solr field 'refresh_s' which stores the refresh url contained in
13 years ago
Michael Peter Christen f3167def64 do not fill the keywords with title content if keywords do not exist.
13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in
13 years ago
Michael Peter Christen dbdd697f4d moved RDFaParser.xsl configuration file to defaults
13 years ago
Michael Peter Christen 786be7d175 better integration of RDFaParser
13 years ago
Michael Peter Christen de3ef8ad73 removed unimportant warnings
13 years ago
Michael Peter Christen 24bbe359ca integrate also geonames library files for less cities. these are more
13 years ago
Michael Peter Christen 223a5440ab preventing that an empty pnd is inserted into the vocabularies
13 years ago
Michael Peter Christen 963f92ed9a - merged files
13 years ago
Michael Peter Christen dd88d0ace2 more logging
13 years ago
Michael Peter Christen 94d54e2d91 added recognition of multi-word terms in vocabulary matching
13 years ago
Michael Peter Christen 64c0268b2b show triplestore metadata in yacydoc and viewfile
13 years ago
Michael Peter Christen c2f0d16d2c fixed vocabulary initialization
13 years ago
Michael Peter Christen df3531f8d5 added the generation of virtual vocabularies using the pnd
13 years ago
Michael Peter Christen a0f1decd82 - added loading of the dbpedia pnd triplestore in the dictionary loader
13 years ago
Michael Peter Christen 16d8f33795 added objectlink generation to vocabulary generation and editor
13 years ago
Michael Peter Christen d45718251e refactoring (Localization -> Location)
13 years ago
Michael Peter Christen b8b3c87ba7 - renamed localization to location (that was confusing)
13 years ago
Michael Peter Christen e89747bb67 - added automated generation of vocabularies from url stubs
13 years ago
Michael Peter Christen 79464189a4 The 'Locale' vocabulary, which is generated by geo data, has now the
13 years ago
Michael Peter Christen 61bb52d55c - using http://purl.org/dc/terms/references to refer from an
13 years ago
Michael Peter Christen 50c576599b allow multiple parser options instead of printing an error
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
cominch bbfc53b663 bugfix
13 years ago