Commit Graph

1384 Commits (8d7099a081eb381a39fc5feb301583b827a97f37)

Author SHA1 Message Date
luccioman 8edbcd8ad4 Log eventual Solr instances close errors.
8 years ago
reger 330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
8 years ago
Michael Peter Christen df51e4ef07 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen e063aaf97f enable fuzzy search, solr style (append a ~ to get a fuzzyness on the
9 years ago
reger 7f63fc50f3 prepare a IndexSegment test case for RWI index testing
9 years ago
luccioman 06d4f93d03 Merged master into postprocessing branch
9 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
9 years ago
reger 51c077f493 adjust the getTopics() and getTopicNavigator() to current useage
9 years ago
reger cc2d9dd3f1 reactivate the use of included-in-topwords boost in postRanking
9 years ago
reger 6801673a07 apply postranking media search boost only on media queries
9 years ago
luccioman 8c49a755da Postprocessing refactoring
9 years ago
luccioman 42f45760ed Refactored postprocessing
9 years ago
Michael Peter Christen 079112358c Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen efeb592661 don't do solr optimization, this create high IO load. We should leave
9 years ago
reger 4c7a77662a eleminate dependency on file-extension in storeDocument but use supported mime-type
9 years ago
reger 2910fe35c1 add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
9 years ago
reger 70d47ae38a keep scheduler selection by repeat entry from 07311020d4
9 years ago
reger 7c3f932e5d revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
9 years ago
reger 07311020d4 postpone apicall exec date init until actual call
9 years ago
reger fcad2d0744 add uses of config constant INDEX_RECEIVE_ALLOW
9 years ago
reger 35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
9 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
Michael Peter Christen 7466d390b2 small refactoring + do not accept too old peers during bootstrap
9 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
9 years ago
reger 8d58a48029 remove wrong log line in CrawlSwitchboard
9 years ago
reger b119ff65be clean out not used Switchboard variables
9 years ago
reger bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger f10ea3c155 clean-out unused SwitchboardConstants
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger 6ecc180299 fix rwi doubledom return best (highest) ranking
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
Michael Peter Christen b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
Michael Peter Christen a6bf0b1649 0N - added option to generate index export files for a specific number
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger caf9e98f09 put metadata dc_publisher in corresponding schema field
9 years ago
luc 3f338777f7 Also check and index eventual icon url information from metadata.
9 years ago
reger 6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 480772c070 Fixed json search results from commit "Improved URLLicence reliability"
9 years ago
reger 535d4bf75f respect hidden attribute for file and smb directory listing
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
reger ed3e16e092 apply remote result count config value to Bookmark Autosearch
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
Ryszard Goń 1728cd30c6 Create autocrawl profiles
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
reger a58d34a4e8 check error URL cache before adding errorDoc to index
9 years ago
reger cd26717ba2 fix low memory status hint (dht-in disabled)
9 years ago
sixcooler dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 45b9bd8403 adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
9 years ago
reger dec3e6ad96 fix: adjust urlstub for mailto links
9 years ago
luc 8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
9 years ago
reger 28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
9 years ago
reger 020630efd8 remove unused network scanner parameter from queryparameter
9 years ago
luc ad5586f8f6 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
reger cdb8f3b10d make current ranking score value avail. to search interface / api
9 years ago
Michael Peter Christen ef8cd80593 fix for npe
9 years ago
reger 0347bfa71f Apply collection query constraint/modifiert to rwi result stack.
9 years ago
reger ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
sixcooler 646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler 194df613de not using 'location' as defaultfacetfield - since we removed it being
9 years ago
sixcooler 4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger 11f3666660 increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger a58ee49307 Optimize internal imagequery focus on using content_type to select images
9 years ago
Michael Peter Christen 151ccd50a9 fix for image size field values (must be multi-valued)
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
Michael Peter Christen c737ff235d in case that the include_string contains several entries including
9 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
10 years ago
reger 3428b6f13b improve filtering by filetype navigator.
10 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
10 years ago
reger 802ccaead6 fix init of error cache, use latest faildates => load_date_dt
10 years ago
reger dba7f15073 apply same size constrain on result image from doc
10 years ago
sixcooler 87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
10 years ago
reger eaf0e8ff2c start recording/indexing pixel size for image document
10 years ago
reger c33229fc0c check mime prior to ext for metadata modification for images
10 years ago
reger 19f1308bf0 enforce th result images limit to > 16x16px
10 years ago
Michael Peter Christen 8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen df3314ac1a added a new facet type based on a probabilistic classifier using
10 years ago
reger 1409cabe8b exclude more default search fields from text copy to text_t
10 years ago
Michael Peter Christen dbbad23e12 removed warnings
10 years ago
Michael Peter Christen c14bc8d9b7 revert of fq transformation (recent fix)
10 years ago
Michael Peter Christen 11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen b94bd7f20a a collection of search query enhancements:
10 years ago
reger cb67eb7baf use more absolute path for config file opening
10 years ago
Michael Peter Christen de8cfbe1d7 added export option to export the fulltext of the search index text only
10 years ago
Michael Peter Christen 0aa6fcf259 remove old vocabularies and synonyms before adding new
10 years ago
reger f91298d3b6 fix one implicit Integer/Long type conversion
10 years ago
reger 821262a179 add CommonPattern for multiple spaces
10 years ago
Michael Peter Christen 90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
10 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen 694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
10 years ago
reger 0fab445b19 Resourceobserver log warning - deleting releases files - only on actual deletes
10 years ago
reger c973f94936 add log entry on release file delete by ResourceObserver
10 years ago
reger 121972752c implement deleteOldDownloads in RexourceObserver on low diskspace
10 years ago
reger 49b79987c9 remove obsolete searchfl work table
10 years ago
Michael Peter Christen d0aff91f23 fix for index import
10 years ago
Michael Peter Christen 34de1e8cbc gzip compression will perform more efficient and with better compression
10 years ago
Michael Peter Christen 98be59ce9c full solr xml exports will now be automatically compressed during
10 years ago
Michael Peter Christen b43811d38c added surrogate import process for exported solr dumps.
10 years ago
Michael Peter Christen c7576d6028 added a full solr export to the IndexControlURLs_p.html servlet. The
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
reger 1d8e1e4bac - Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
10 years ago
reger af57fbefad use available mime (instead null) on imageresult from metadatanode
10 years ago
reger 000dde9511 Eleminate duplication of values for search ResultEntry
10 years ago
reger 29c4aa3991 fix compiler notification of missing serialID
10 years ago
reger 3d53da8236 refactor ResultEntry to be based on MetadataNode/SolrDocument
10 years ago
reger d882991bc5 Implement sharing of ioDispatcher for term & citation index
10 years ago
reger 370ba9da71 On imageSearch prefere mime to sort out none-image documents
10 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
reger f3ce99bfb8 fix extract of inboundlinks_protocol_sxt
10 years ago
reger 2bc9cb5828 fix early return in addToCrawler
10 years ago
Michael Peter Christen 0710648c31 enable api calls with very long urls
10 years ago
reger 1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
10 years ago
reger 752eec6697 fix NPE in addToIndex when used outside searchEvent
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen 6f4fe4b175 revert of 8a7c68e4c7
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen 9d8f426890 adding a try-catch to link graph processing to prevent that a single
10 years ago
reger 8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
10 years ago
reger 7224209486 break out of NormalizeDistributor loop on timeout
10 years ago
reger 47e61f8325 fix typo in image filter query
10 years ago
reger 4b4ab6799f fix String out of range in Collection Nav
10 years ago
reger 5408448a56 skip redundant add. of keywords to text
10 years ago
reger 296e97c78e put https port in peers dna
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b060ba900d added parsing of contentprop attribute in html tags for
10 years ago
Michael Peter Christen 4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
10 years ago
reger 1395f10e95 fix typecast for css links
10 years ago
Michael Peter Christen abaaaef5f1 fix for filter queries
10 years ago
Michael Peter Christen f5a032f293 split query into filter query and text query to get better ranking
10 years ago
Michael Peter Christen 2e88028c1a when selecting collections in navigation, do show the un-selected
10 years ago
Michael Peter Christen fa7edc9f7a refactoring of filter queries (several queries instead only one)
10 years ago
Michael Peter Christen 40389987ec Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen f9ba50379d added an expansion option to search facets on result page:
10 years ago
reger 1f0f77bb77 make location facet return results
10 years ago
Michael Peter Christen 9bf0d7ecb9 added a new collection type 'dht' to all documents from the peer-to-peer
10 years ago
reger f63fff9008 fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
10 years ago
reger b241264632 fix error on *abc query input
10 years ago
reger 7e09bff4a1 exclude default search fields from text copy to text_t
10 years ago
reger 8af70950d9 harmonize snippet computation
10 years ago
Michael Peter Christen fd4e2c809a Show dates in the content of a document in the search result:
10 years ago
Michael Peter Christen d9d3111d10 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
reger d7259419f3 postpone raw snippet html encoding upon use
10 years ago
reger 9b0de2de64 introduce getQueryFields to return default query fields (queryparamter QF)
10 years ago
reger 4b97ddb9ec stop sending crawl receipts if receiver got offline
10 years ago
reger fba34e12ef fix formatting issue if snippet contains html code
10 years ago
reger e48720a58c fix NPE in snippet computation
10 years ago
Michael Peter Christen 97ba5ddbb7 configuration option for maxload limit for remote search
10 years ago
reger 9e1ec5fec4 refactor: just some more useages of constant for term ":[* TO *]"
10 years ago
reger 8c491f51a5 remove hardcoded initialization of language nav if not used
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
Michael Peter Christen a8a2b7a803 persistency for vocabulary facet switch
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
Michael Peter Christen 5a060c9f26 refactoring of reindexSolr (just replaced constant string)
10 years ago
Michael Peter Christen 3d717b749a fix for urlmaskfilter
10 years ago
Michael Peter Christen bee5ee7cce removed some warnings
10 years ago
reger 42b0672be3 Let auto-disabled crawls recover if low resource condition vanished.
10 years ago
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
reger 24f68a4eb7 refactor opensearch heuristic
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago
Michael Peter Christen 8cafdb989a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 66839f73fa remove debug limit from commit before
10 years ago
reger 4214f250d0 Add option for extended search (Autosearch) to Bookmark.html asking all connected peers for the searchterm added as description to the bookmark created by the bookmark icon.
10 years ago
Michael Peter Christen 3e6c3e2237 documents pushed over the api/push_p.html interface will have their
10 years ago
reger 4eb89d7f15 revert clickservlet
10 years ago
reger d44d8996d0 Added a “don't store remote search results” option
10 years ago
Michael Peter Christen d2792a43fd do not write iframe and embed links into webgraph, but use them anyway
10 years ago
Michael Peter Christen ecb6a59e9e do not translate gif images into png images for thumbnails. Instead,
10 years ago
reger 73ba5d8ef7 adjust fieldtype and description of field httpstatus_redirect_s in CollectionSchema
10 years ago
Michael Peter Christen eb78388a98 changed prefer strategy for http unique in such a way that http is
10 years ago
Michael Peter Christen aaf7d4775a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 8c3e5b7b6d added experimental pdf splitting which enables YaCy to split pdfs during
10 years ago
reger 198102304b refactor size() -> filesize() of URIMetadataNode
10 years ago
Michael Peter Christen d3e71ed070 fixes for searches when initialization of large autotagging libraries
10 years ago
Michael Peter Christen 28683530cd fixes to usage of no-cache: use and recognize also the no-store
10 years ago
reger 13cca2b114 fix missing AppPath
10 years ago
Michael Peter Christen 65125439fe added query modifier 'on'. This makes it possible to search for date
10 years ago
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 6a1865f507 refactoring date -> lastModified
10 years ago
Michael Peter Christen 7bfc5b80cb added new options to vocabulary editor:
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger 5f0bb1214f modified FieldReIndex to reindex queries with low number of documents first
10 years ago
reger e52370728a fix startup stop on missing HTCACHE/SNAPSHOT directory
10 years ago
reger 70cf7060a4 coding fixes suggested in
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen 60f27bdf49 added the property timeoutrequests to configuration to disable
10 years ago
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
10 years ago