Commit Graph

1346 Commits (9049a926a511edea53f6b2622c06fcc50b96f2d7)

Author SHA1 Message Date
reger 3c7220bc7b Refacture rwi reference word position and word distance calculation
8 years ago
luccioman f0639d810c Customized name for Threads still using the default "Thread-n" pattern.
8 years ago
luccioman 7263d17436 Removed mentions of deprecated LURL-db.
8 years ago
reger 31d2a5645e remove obsolete query variable
8 years ago
luccioman 6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
8 years ago
reger 685d8e86bf Avoid frequent data type casting (float/long) for rwi score
8 years ago
reger e68b00678e prevent negative score on URIMetadataNode - in the special case were no
8 years ago
luccioman 8d57b5b970 Added some javadocs.
8 years ago
luccioman 60df09fff9 Fixed some HTML validation errors : Illegal character in query
8 years ago
luccioman b3b75b0498 Accessibility : add a customizable alternative text to YaCy log
8 years ago
luccioman 3ee4f56c39 Improved ErrorCache behavior when switching networks
8 years ago
luccioman 7d5ba2afa4 Added some JavaDoc and moved crawlStacker close at the right place.
8 years ago
luccioman 8edbcd8ad4 Log eventual Solr instances close errors.
8 years ago
reger 330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
8 years ago
Michael Peter Christen df51e4ef07 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen e063aaf97f enable fuzzy search, solr style (append a ~ to get a fuzzyness on the
9 years ago
reger 7f63fc50f3 prepare a IndexSegment test case for RWI index testing
9 years ago
luccioman 06d4f93d03 Merged master into postprocessing branch
9 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
9 years ago
reger 51c077f493 adjust the getTopics() and getTopicNavigator() to current useage
9 years ago
reger cc2d9dd3f1 reactivate the use of included-in-topwords boost in postRanking
9 years ago
reger 6801673a07 apply postranking media search boost only on media queries
9 years ago
luccioman 8c49a755da Postprocessing refactoring
9 years ago
luccioman 42f45760ed Refactored postprocessing
9 years ago
Michael Peter Christen 079112358c Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen efeb592661 don't do solr optimization, this create high IO load. We should leave
9 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
9 years ago
reger 2910fe35c1 add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
9 years ago
reger 70d47ae38a keep scheduler selection by repeat entry from 07311020d4
9 years ago
reger 7c3f932e5d revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
9 years ago
reger 07311020d4 postpone apicall exec date init until actual call
9 years ago
reger fcad2d0744 add uses of config constant INDEX_RECEIVE_ALLOW
9 years ago
reger 35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
9 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 7466d390b2 small refactoring + do not accept too old peers during bootstrap
9 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
9 years ago
reger 8d58a48029 remove wrong log line in CrawlSwitchboard
9 years ago
reger b119ff65be clean out not used Switchboard variables
9 years ago
reger bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger f10ea3c155 clean-out unused SwitchboardConstants
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger 6ecc180299 fix rwi doubledom return best (highest) ranking
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
Michael Peter Christen b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
Michael Peter Christen a6bf0b1649 0N - added option to generate index export files for a specific number
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
luc 3f338777f7 Also check and index eventual icon url information from metadata.
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 480772c070 Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger 11f3666660 increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger a58ee49307 Optimize internal imagequery focus on using content_type to select images
9 years ago
Michael Peter Christen 151ccd50a9 fix for image size field values (must be multi-valued)
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
Michael Peter Christen c737ff235d in case that the include_string contains several entries including
9 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
10 years ago
reger 3428b6f13b improve filtering by filetype navigator.
10 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
10 years ago
reger 802ccaead6 fix init of error cache, use latest faildates => load_date_dt
10 years ago
reger dba7f15073 apply same size constrain on result image from doc
10 years ago
sixcooler 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
10 years ago
reger eaf0e8ff2c start recording/indexing pixel size for image document
10 years ago
reger c33229fc0c check mime prior to ext for metadata modification for images
10 years ago
reger 19f1308bf0 enforce th result images limit to > 16x16px
10 years ago
Michael Peter Christen 8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen df3314ac1a added a new facet type based on a probabilistic classifier using
10 years ago
reger 1409cabe8b exclude more default search fields from text copy to text_t
10 years ago
Michael Peter Christen dbbad23e12 removed warnings
10 years ago
Michael Peter Christen c14bc8d9b7 revert of fq transformation (recent fix)
10 years ago
Michael Peter Christen 11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen b94bd7f20a a collection of search query enhancements:
10 years ago
reger cb67eb7baf use more absolute path for config file opening
10 years ago
Michael Peter Christen de8cfbe1d7 added export option to export the fulltext of the search index text only
10 years ago
Michael Peter Christen 0aa6fcf259 remove old vocabularies and synonyms before adding new
10 years ago
reger f91298d3b6 fix one implicit Integer/Long type conversion
10 years ago
reger 821262a179 add CommonPattern for multiple spaces
10 years ago
Michael Peter Christen 90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
10 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen 694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
10 years ago
reger 0fab445b19 Resourceobserver log warning - deleting releases files - only on actual deletes
10 years ago
reger c973f94936 add log entry on release file delete by ResourceObserver
10 years ago
reger 121972752c implement deleteOldDownloads in RexourceObserver on low diskspace
10 years ago
reger 49b79987c9 remove obsolete searchfl work table
10 years ago
Michael Peter Christen d0aff91f23 fix for index import
10 years ago
Michael Peter Christen 34de1e8cbc gzip compression will perform more efficient and with better compression
10 years ago
Michael Peter Christen 98be59ce9c full solr xml exports will now be automatically compressed during
10 years ago
Michael Peter Christen b43811d38c added surrogate import process for exported solr dumps.
10 years ago
Michael Peter Christen c7576d6028 added a full solr export to the IndexControlURLs_p.html servlet. The
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
reger 1d8e1e4bac - Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
10 years ago
reger af57fbefad use available mime (instead null) on imageresult from metadatanode
10 years ago
reger 000dde9511 Eleminate duplication of values for search ResultEntry
10 years ago
reger 29c4aa3991 fix compiler notification of missing serialID
10 years ago
reger 3d53da8236 refactor ResultEntry to be based on MetadataNode/SolrDocument
10 years ago
reger d882991bc5 Implement sharing of ioDispatcher for term & citation index
10 years ago
reger 370ba9da71 On imageSearch prefere mime to sort out none-image documents
10 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
reger f3ce99bfb8 fix extract of inboundlinks_protocol_sxt
10 years ago
reger 2bc9cb5828 fix early return in addToCrawler
10 years ago
Michael Peter Christen 0710648c31 enable api calls with very long urls
10 years ago
reger 1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
10 years ago
reger 752eec6697 fix NPE in addToIndex when used outside searchEvent
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen 6f4fe4b175 revert of 8a7c68e4c7
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen 9d8f426890 adding a try-catch to link graph processing to prevent that a single
10 years ago
reger 8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
10 years ago
reger 7224209486 break out of NormalizeDistributor loop on timeout
10 years ago
reger 47e61f8325 fix typo in image filter query
10 years ago
reger 4b4ab6799f fix String out of range in Collection Nav
10 years ago
reger 5408448a56 skip redundant add. of keywords to text
10 years ago
reger 296e97c78e put https port in peers dna
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b060ba900d added parsing of contentprop attribute in html tags for
10 years ago
Michael Peter Christen 4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
10 years ago
reger 1395f10e95 fix typecast for css links
10 years ago