Commit Graph

540 Commits (43d5cd101ef7790a993f0810e20db38a559019f8)

Author SHA1 Message Date
luccioman d92b191942 Ensure no remote Solr is attached before "Shut Down and Re-Start Solr"
7 years ago
luccioman 69690c13a0 Optionally allow external Solr server with self-signed certificate
7 years ago
luccioman ba9cd14516 Removed hard-coded patch for Solr 5.0 on ranking boost function
7 years ago
luccioman fb3032c530 Added a crawl filtering possibility on documents Media Type (MIME)
7 years ago
luccioman c3ff50c17a Updated the list of audio file formats supported by the audioTagParser
7 years ago
luccioman 9412881230 Added basic support for autotagging microdata annotated item types.
7 years ago
luccioman 9ddf92d143 Removed unncessary reflection usage for workflow tasks.
7 years ago
luccioman 9624516bf8 Refresh recrawl job profile threshold date like other default profiles
7 years ago
luccioman d47afe6fab Use a constant for crawler reject reason prefix with specific processing
7 years ago
luccioman 09c4ee56a7 Added optional https support for remote crawl and profile operations
7 years ago
luccioman 1c4803e40a Enable optional https support for /yacy/transferURL API calls.
7 years ago
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
7 years ago
luccioman 46f37e38dc Customized Threads with generic name for easier monitoring.
7 years ago
luccioman 8e732d437c Enable HTTP Digest authentication for non admin users.
7 years ago
luccioman af198b990b Added an optional login link/status to the search public top nav bar.
7 years ago
luccioman ef8aea7f8d Made the dates navigator max elements number user configurable.
7 years ago
luccioman 4eba88f2ff Removed some unnecessary uses of java.lang.reflect api.
7 years ago
luccioman 28b451a0b3 Made Cache compression level and lock timeout user configurable
8 years ago
luccioman a7394b479b Limit the synchronization blocking time on some Cache operations.
8 years ago
luccioman 8399275142 Properly close file output streams even on exceptions scenarios.
8 years ago
luccioman a04feac064 Ensure file input streams proper closing in both success and failures
8 years ago
Michael Peter Christen 200b100fb8 added patch to rewrite altered yacy grid schema into yacy schema
8 years ago
Michael Peter Christen 973d74712f added yacy grid flatjson surrogate parser
8 years ago
luccioman b1da92648e Fixed surrogates import monitoring page (/CrawlResults.html?process=7)
8 years ago
Michael Peter Christen f5ad29edb1 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
8 years ago
Michael Peter Christen 76e9135526 added flatjson parser (stub, unfinished)
8 years ago
reger ba339a2a45 Add servlet to import warc file from filesystem IndexImportWarc_p.html.
8 years ago
reger 510f11d374 Implement surrogate import from Warc archives (as first option handle
8 years ago
reger 3dd23c178b Introduce the option to configure a shutdown port.
8 years ago
reger a2afb4bae0 add switchboardconstants for server ports config keys
8 years ago
luccioman c68a8be2d9 Refactored and enforced Solr mandatory fields for proper operation
8 years ago
luccioman 0da1e6ba16 Factored code re-implementing DigestURL.hosthash() method.
8 years ago
luccioman 6a4d51d8f9 Cleaned up some Javadoc warnings.
8 years ago
reger 68d4dc5cc5 Complete harmonization RequestHeader getCookie with std ServletRequest
8 years ago
luccioman 1df558a6c6 Fixed YaCy proper shutdown triggered by SIGTERM signal.
8 years ago
luccioman 3ca695390c FTP crawl start URLs : applied crawl profile depth control
8 years ago
luccioman 467650c042 Hardened system update checks.
8 years ago
luccioman d27adc2b92 Fixed language detector initialization and NullPointerException cases.
8 years ago
reger f7e9f9be5f move Digest auth checks from DefaultServlet to adminAuthenticated,
8 years ago
reger 44a6a4e795 fix authentication by hit in userdb (wrong parameter)
8 years ago
luccioman aa9ddf3c23 Added control over Robots.txt active threads maximum number.
8 years ago
reger bad8f87998 remove old/obsolete clear text "adminAccount" credential entry from init
8 years ago
reger 59448461d3 make use of userInRole for quick login verification
8 years ago
reger 2a4d826d9e adjust servlet RequestHeader.getLocale
8 years ago
reger af39a76bf6 Reduce number of default max. search navigator lines (from 10000)
8 years ago
luccioman f0639d810c Customized name for Threads still using the default "Thread-n" pattern.
8 years ago
luccioman 7d5ba2afa4 Added some JavaDoc and moved crawlStacker close at the right place.
8 years ago
Michael Peter Christen efeb592661 don't do solr optimization, this create high IO load. We should leave
8 years ago
reger 2910fe35c1 add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
8 years ago
reger 70d47ae38a keep scheduler selection by repeat entry from 07311020d4
8 years ago
reger 7c3f932e5d revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
8 years ago
reger 07311020d4 postpone apicall exec date init until actual call
8 years ago
reger fcad2d0744 add uses of config constant INDEX_RECEIVE_ALLOW
8 years ago
Michael Peter Christen 7466d390b2 small refactoring + do not accept too old peers during bootstrap
8 years ago
reger 8d58a48029 remove wrong log line in CrawlSwitchboard
8 years ago
reger b119ff65be clean out not used Switchboard variables
9 years ago
reger bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
9 years ago
JeremyRand 433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
reger d0a571bed2 del cytag trail for own index.html (save resource not used by default)
9 years ago
reger 7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
9 years ago
reger ef24593347 delete obsolete SEARCHRESULT busythread constants
9 years ago
reger d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
9 years ago
Michael Peter Christen 849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
9 years ago
reger 06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
9 years ago
reger a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
9 years ago
Ryszard Goń a98c395023 Add the Autocrawl thread
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
luc 8ebefa4233 Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
9 years ago
reger 52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
9 years ago
reger a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt()
9 years ago
Michael Peter Christen 3d7dd9d3aa follow-up to latest commit: also flush the search cache if all crawls
9 years ago
reger 7889fc2389 Hack to prevent Solr issue on partial update on a document containing multivalued date field
9 years ago
reger e37a4f0b3d prevent metadata records in index w/o valid url
9 years ago
Michael Peter Christen df3314ac1a added a new facet type based on a probabilistic classifier using
9 years ago
reger cb67eb7baf use more absolute path for config file opening
9 years ago
Michael Peter Christen 90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
10 years ago
Michael Peter Christen 593de05922 enhanced surrogate import process speed (dramatically!)
10 years ago
Michael Peter Christen 694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
10 years ago
reger 121972752c implement deleteOldDownloads in RexourceObserver on low diskspace
10 years ago
reger 49b79987c9 remove obsolete searchfl work table
10 years ago
Michael Peter Christen d0aff91f23 fix for index import
10 years ago
Michael Peter Christen 34de1e8cbc gzip compression will perform more efficient and with better compression
10 years ago
Michael Peter Christen b43811d38c added surrogate import process for exported solr dumps.
10 years ago
Michael Peter Christen 197f7449e5 All entities of crawl profiles are now editable in the crawl profile
10 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
reger 2bc9cb5828 fix early return in addToCrawler
10 years ago
reger 752eec6697 fix NPE in addToIndex when used outside searchEvent
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen 6f4fe4b175 revert of 8a7c68e4c7
10 years ago
reger 8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
10 years ago
reger 296e97c78e put https port in peers dna
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
reger 4b97ddb9ec stop sending crawl receipts if receiver got offline
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen a8a2b7a803 persistency for vocabulary facet switch
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
reger 24f68a4eb7 refactor opensearch heuristic
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago