Commit Graph

228 Commits (4b100f8b485b368672590b592932bf2168a8b54a)

Author SHA1 Message Date
Michael Peter Christen 6f0baaa309 added the clickdepth post-processing: some links may have 'shortcuts' to
12 years ago
Michael Peter Christen 0f5b6f38c1 enhanced root-url detection
12 years ago
Michael Peter Christen 5c0c56cfe1 Preparations to produce a click depth attribute in the search index.
12 years ago
reger 4987caf1c9 - apply fix for localhost handling (from yacy2solr) also to metadata2solr
12 years ago
Michael Peter Christen 2a4c064c89 using the publisher information for the author field if no author is
12 years ago
Michael Peter Christen eac9650b31 added another solr field clickdepth_i which reflects the number of
12 years ago
Michael Peter Christen 1052263af3 - added a new solr field references_i which stores the number of
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen fb0fa9a102 - fixed 'delete from subpath' during crawl start which deleted nothing;
12 years ago
orbiter a4a780b871 - fix for bad url conversion in bookmarks when using smb urls
12 years ago
Michael Peter Christen 72f165d58b added a Boost class which stores solr query boost values. The class can
12 years ago
Michael Peter Christen 8fc3679c66 using more pre-compile pattern for split methods
12 years ago
Michael Peter Christen b7004043ea - added a field cache for solr queries which call only for a single
12 years ago
Michael Peter Christen efd2c4622d added a new fail type attribute for the index to distinguish two
12 years ago
Michael Peter Christen 4eab3aae60 removed overhead by preventing generation of full search results when
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen f5ca5cea44 - added field options to all solr queries. This can be used to restrict
12 years ago
orbiter 5dfd6359cb redesign of the QueryParams class: introduced QueryGoal which holds the
12 years ago
Michael Peter Christen 5fd3b93661 added deletion of hosts during crawl start if deleteold option was given
12 years ago
Michael Peter Christen 842faf96a2 fixed media search
13 years ago
Michael Peter Christen 93001586a0 removed warnings, removed too-fast pausing of crawls
13 years ago
Michael Peter Christen 12c0db20e5 fixed npe for surrogate import
13 years ago
Michael Peter Christen 52df6ee369 more logging
13 years ago
Michael Peter Christen 15d1460b40 added information about the reason of pausing of crawls
13 years ago
Michael Peter Christen 2371ef031c added solr faceted search support to YaCy search results
13 years ago
Michael Peter Christen d481abd087 added the visualization of error-urls to host browser
13 years ago
Michael Peter Christen 97f82994a6 automatically pause the crawler if there is a problem with solr
13 years ago
Michael Peter Christen 8fb370d9f8 renovated the way how search results are count. should be correct now...
13 years ago
orbiter 354ef8000d - added 'deleteold' option to crawler which causes that documents are
13 years ago
Michael Peter Christen 75dd706e1b update to HostBrowser:
13 years ago
Michael Peter Christen e2c4c3c7d3 migration to solr 4.0.0
13 years ago
Michael Peter Christen 9330ad4838 - fixed the delete option in host browser
13 years ago
Michael Peter Christen 6629e37685 tried to clean up the search process mess
13 years ago
Michael Peter Christen f8f05ecba7 - added a delete button in host browser to delete a complete subpath
13 years ago
Michael Peter Christen c326aa8f67 disabled writing new entries to crawl stacks to prevent that a domain
13 years ago
Michael Peter Christen 6905182d41 - fix for number of words log message
13 years ago
Michael Peter Christen 799d71bc67 enhanced solr caching:
13 years ago
Michael Peter Christen 8e1248ffe3 force a commit in advance of a search for the administrator to get most
13 years ago
Michael Peter Christen 3b48c78190 added an option to force a commit to solr.
13 years ago
Michael Peter Christen ce0e5b1e17 - more refactoring / private methods
13 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
13 years ago
Michael Peter Christen e5b3c172ff removed hack which translated Solr documents to virtual RWI entries
13 years ago
Michael Peter Christen 5d16c23a1f specified more URIMetadata as URIMetadataNode
13 years ago
Michael Peter Christen 43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
13 years ago
Michael Peter Christen cc98496ff3 enhanced the HostBrowser:
13 years ago
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
13 years ago
Michael Peter Christen 1b02408936 use less cache
13 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
13 years ago
Michael Peter Christen 7e3e45fd04 added Open Graph Metadata default fields, see http://ogp.me/ns#
13 years ago
Michael Peter Christen c3e5f667a7 added schema.org breadcrumb counter to parser and solr schema
13 years ago
Michael Peter Christen bd769de604 since the solr index is now used for all pages that are indexed locally,
13 years ago
Michael Peter Christen f8a3ab2d82 added the usage of synonyms to the GSA search interface
13 years ago
Michael Peter Christen 3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
13 years ago
Michael Peter Christen 3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter 3190347814 added a synonyms_t field to solr and a process to read synonym files.
13 years ago
Michael Peter Christen 411d0e839b added an underline text field to solr to record all underlined texts
13 years ago
Michael Peter Christen c4a3d8870f fixed computation of links in host browser which are not indexed but
13 years ago
Michael Peter Christen 24d2ee3c52 - better date ranking
13 years ago
Michael Peter Christen ca313e404f - if a "/date" modifier is used, the solr remote query applies an
13 years ago
Michael Peter Christen a4214694df We assert that no other metadata storage than solr is used now.
13 years ago
Michael Peter Christen 562183932b - removed ip_s from default profile since that needs a DNS lookup to
13 years ago
Michael Peter Christen 1533bfd63b refactoring
13 years ago
Michael Peter Christen 872f83ebe0 refactoring
13 years ago
Michael Peter Christen fb9460f0a8 using the search filter to drill down search to file types.
13 years ago
Michael Peter Christen 15ea053c3a - added xml output in IndexControlURLs to get the storage page of index
13 years ago
Michael Peter Christen 1b474139dd used the new zip writer/reader to add a solr dump process: the whole
13 years ago
Michael Peter Christen 8219a445f3 refactoring
13 years ago
Michael Peter Christen 00c1c777fa refactoring
13 years ago
orbiter 563d584420 removed more dependencies in cora from kelondro
13 years ago
Michael Peter Christen 62add1d564 added the protocol and the file name extension to the solr fields since
13 years ago
Michael Peter Christen 9db032664e activate two solr fields which will be used by administration interface
13 years ago
Michael Peter Christen 4634f0e626 fix for images_withalt
13 years ago
Michael Peter Christen 10b911eed4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen be67c70a47 added Solr fields:
13 years ago
orbiter d73fff0e0e added solr field images_withalt_i
13 years ago
sixcooler e78fe3f477 also do a clearcache on the solr-connector-caches
13 years ago
Michael Peter Christen d8425e6809 added collections to crawl monitor
13 years ago
Michael Peter Christen ee23fc7a32 added h1..h6 counter fields
13 years ago
Michael Peter Christen b2b516cc3e added a collection attribute to crawls and searches:
13 years ago
Michael Peter Christen f75b3f8a47 added more patches to work without RWI data structure
13 years ago
Michael Peter Christen 31d4d38804 - extended the solr interface by a references-by-word-count method
13 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
13 years ago
Michael Peter Christen 2ddc33646a added new field for solr:
13 years ago
Michael Peter Christen 316b5fe116 - added a solr type definition verifier
13 years ago
Michael Peter Christen e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
13 years ago
orbiter 29171e2f6c fixed generation of ontologies from index enumerations
13 years ago
orbiter 01a63ef595 redesign of YaCySchema and SolrDoc handling
13 years ago
orbiter 479bfca571 refctoring
13 years ago
Michael Peter Christen 4716546ef5 - reduced memory usage in index transmission using a transformation of
13 years ago
orbiter 716ea0cfe2 sorted the solr schema into mandatory and optional fields; reduced
13 years ago
orbiter 9b8c8c0f47 fix from gaston in
13 years ago
orbiter d7ea45f698 - get nice text_t values from metadata conversions that are stored into
13 years ago
orbiter 780f8974e7 added ramaining iteration methods for solr in fulltext class
13 years ago
orbiter ee01c12e56 fixes for putDocument and putMetadata
13 years ago
orbiter cc47a0876e reverted bf55f69176
13 years ago
Michael Peter Christen 0cab06c47c refactoring
13 years ago
Michael Peter Christen bf55f69176 removed write methods to old metadata file type; all metadata now goes
13 years ago
Michael Peter Christen 40c0856489 refactoring
13 years ago
Michael Peter Christen 06a78eecb7 code simplification
13 years ago
Michael Peter Christen 18f989dfb1 - refactoring (load -> getMetadata)
13 years ago