diff --git a/locales/zh.lng b/locales/zh.lng index 6484dbeae..b1e87702c 100644 --- a/locales/zh.lng +++ b/locales/zh.lng @@ -159,7 +159,7 @@ Edit list==编辑列表 The right '*', after the '/', can be replaced by a==在'/'之后的右边'*'可以被替换为 >regular expression<==>正则表达式< #(slow)==(慢) -"set"=="集合" +"set"=="收集" The right '*'==右边的'*' Used Blacklist engine:==使用的黑名单引擎: Active list:==激活列表: @@ -401,14 +401,13 @@ Save User==保存用户 #File: ConfigAppearance_p.html #--------------------------- -Appearance and Integration==外观界面 +Appearance and Integration==外观整合 You can change the appearance of the YaCy interface with skins.==你可以在这里修改YaCy的外观界面. -#You can change the appearance of YaCy with skins==Sie können hier das Erscheinungsbild von YaCy mit Skins ändern The selected skin and language also affects the appearance of the search page.==选择的皮肤和语言也会影响到搜索页面的外观. If you create a search portal with YaCy then you can==如果你创建YaCy门户, change the appearance of the search page here.==那么你能在这里 改变搜索页面的外观. -#and the default icons and links on the search page can be replaced with you own.==und die standard Grafiken und Links auf der Suchseite durch Ihre eigenen ersetzen. Skin Selection==选择皮肤 +Select one of the default skins. After selection it might be required to reload the web page while holding the shift key to refresh cached style files.==选择一个默认皮肤。选择后,重新加载网页,可能需要在按住shift键的同时刷新缓存的样式文件。 Select one of the default skins, download new skins, or create your own skin.==选择一个默认皮肤, 下载新皮肤或者创建属于你自己的皮肤. Current skin==当前皮肤 Available Skins==可用皮肤 @@ -705,11 +704,11 @@ Property Name==属性名 #File: ConfigPortal_p.html #--------------------------- Integration of a Search Portal==搜索门户设置 -If you like to integrate YaCy as portal for your web pages, you may want to change icons and messages on the search page.==如果你想将YaCy作为你的网站搜索门户, 你可能需要在这改变搜索页面的图标和信息. -The search page may be customized.==搜索页面可以自由定制. +If you like to integrate YaCy as portal for your web pages, you may want to change icons and messages on the search page.==如果你想将YaCy作为你的网站搜索门户, 你可能需要在这改变搜索页面的图标和信息。 +The search page may be customized.==搜索页面可以自由定制。 You can change the 'corporate identity'-images, the greeting line==你可以改变'企业标志'图片,问候语 -and a link to a home page that is reached when the 'corporate identity'-images are clicked.==和一个指向首页的'企业标志'图像链接. -To change also colours and styles use the Appearance Servlet for different skins and languages.==若要改变颜色和风格,请到外观选项选择你喜欢的皮肤和语言. +and a link to a home page that is reached when the 'corporate identity'-images are clicked.==和一个点击'企业标志'图像后转到主页的超链接。 +To change also colours and styles use the Appearance Servlet for different skins and languages.==若要改变颜色和风格,请到外观选项选择你喜欢的皮肤和语言。 Greeting Line<==问候语< URL of Home Page<==主页链接< URL of a Small Corporate Image<==企业形象小图地址< @@ -729,21 +728,25 @@ Media Search==媒体搜索 >Strict==>严格 Control whether media search results are as default strictly limited to indexed documents matching exactly the desired content domain==控制媒体搜索结果是否默认严格限制为与所需内容域完全匹配的索引文档 (images, videos or applications specific)==(图片,视频或具体应用) -or extended to pages including such medias (provide generally more results, but eventually less relevant).==或扩展到包括此类媒体的网页(通常提供更多结果,但相关性更弱) +or extended to pages including such medias (provide generally more results, but eventually less relevant).==或扩展到包括此类媒体的网页(通常提供更多结果,但相关性更弱)。 Remote results resorting==远端搜索结果排序 >On demand, server-side==>根据需要, 服务器侧 Automated, with JavaScript in the browser==自动化, 基于嵌入浏览器的JavaScript代码 +Automated results resorting with JavaScript makes the browser load the full result set of each search request.==基于JavaScript的自动结果重新排序,使浏览器加载每个搜索请求的完整结果集。 +This may lead to high system loads on the server.==这可能会导致服务器上的系统负载过高。 +Please check the 'Peer-to-peer search with JavaScript results resorting' section in the Local Search access rate configuration page to set up proper limitations on this mode by unauthenticated users.==请查看本地搜索访问率 配置页面中的“使用JavaScript对P2P搜索结果重排”部分,对未经身份验证的用户使用该模式加以适当限制。 Remote search encryption==远端搜索加密 -Prefer https for search queries on remote peers.==首选https用于远端节点上的搜索查询. -When SSL/TLS is enabled on remote peers, https should be used to encrypt data exchanged with them when performing peer-to-peer searches.==在远端节点上启用SSL/TLS时,应使用https来加密在执行P2P搜索时与它们交换的数据. -Please note that contrary to strict TLS, certificates are not validated against trusted certificate authorities (CA), thus allowing YaCy peers to use self-signed certificates.==请注意,与严格TLS相反,证书不会针对受信任的证书颁发机构(CA)进行验证,因此允许YaCy节点使用自签名证书. +Prefer https for search queries on remote peers.==首选https用于远端节点上的搜索查询。 +When SSL/TLS is enabled on remote peers, https should be used to encrypt data exchanged with them when performing peer-to-peer searches.==在远端节点上启用SSL/TLS时,应使用https来加密在执行P2P搜索时与它们交换的数据。 +Please note that contrary to strict TLS, certificates are not validated against trusted certificate authorities (CA), thus allowing YaCy peers to use self-signed certificates.==请注意,与严格TLS相反,证书不会针对受信任的证书颁发机构(CA)进行验证,因此允许YaCy节点使用自签名证书。 >Snippet Fetch Strategy==>摘要提取策略 Speed up search results with this option! (use CACHEONLY or FALSE to switch off verification)==使用此选项加速搜索结果!(使用CACHEONLY或FALSE来关闭验证) +Statistics on text snippets generation can be enabled in the Debug/Analysis Settings page.==可以在调试/分析设置页面中启用文本片段生成的统计信息。 NOCACHE: no use of web cache, load all snippets online==NOCACHE:不使用网络缓存,在线加载所有网页摘要 IFFRESH: use the cache if the cache exists and is fresh otherwise load online==IFFRESH:如果缓存存在则使用最新的缓存,否则在线加载 IFEXIST: use the cache if the cache exist or load online==IFEXIST:如果缓存存在则使用缓存,或在线加载 If verification fails, delete index reference==如果验证失败,删除索引参考 -CACHEONLY: never go online, use all content from cache.==CACHEONLY:永远不上网,内容只来自缓存. +CACHEONLY: never go online, use all content from cache.==CACHEONLY:永远不上网,内容只来自缓存。 If no cache entry exist, consider content nevertheless as available and show result without snippet==如果不存在缓存条目,将内容视为可用,并显示没有摘要的结果 FALSE: no link verification and not snippet generation: all search results are valid without verification==FALSE:没有链接验证且没有摘要生成:所有搜索结果在没有验证情况下有效 Link Verification<==链接验证< @@ -760,6 +763,10 @@ Limit size of indexed remote results==现在远端索引结果容量 maximum allowed size in kbytes for each remote search result to be added to the local index==每个远端搜索结果的最大允许大小(以KB为单位)添加到本地索引 for example, a 1000kbytes limit might be useful if you are running YaCy with a low memory setup==例如,如果运行具有低内存设置的YaCy,则1000KB限制可能很有用 Default Pop-Up Page<==默认弹出页面< +>Status Page ==>状态页面  +>Search Front Page==>搜索首页 +>Search Page (small header)==>搜索页面(二级标题) +>Interactive Search Page==>交互搜索页面 Default maximum number of results per page==默认每页最大结果数 Default index.html Page (by forwarder)==默认index.html页面(通过转发器) Target for Click on Search Results==点击搜索结果时 @@ -768,10 +775,9 @@ Target for Click on Search Results==点击搜索结果时 "_parent" (the parent frame of a frameset)=="_parent" (父级窗口) "_top" (top of all frames)=="_top" (置顶) Special Target as Exception for an URL-Pattern==作为URL模式的异常的特殊目标 -Pattern:<=模式:< + Pattern:<= 模式:< Exclude Hosts==排除的主机 -List of hosts that shall be excluded from search results by default==默认情况下将被排除在搜索结果之外的主机列表 -but can be included using the site: operator=但可以使用site:操作符包括进来 +List of hosts that shall be excluded from search results by default but can be included using the site:<host> operator:==默认情况下将被排除在搜索结果之外的主机列表,但可以使用site:<host>操作符包括进来 'About' Column<=='关于'栏< shown in a column alongside==显示在 with the search result page==搜索结果页侧栏 @@ -779,17 +785,8 @@ with the search result page==搜索结果页侧栏 (Content)==(内容) >You have to==>你必须 >set a remote user/password<==>设置一个远端用户/密码< -to change this options.<==来改变设置.< +to change this options.<==来改变设置。< Show Information Links for each Search Result Entry==显示搜索结果的链接信息 ->Date&==>日期& ->Size&==>大小& ->Metadata&==>元数据& ->Parser&==>解析器& ->Pictures==>图片 ->Status Page==>状态页面 ->Search Front Page==>搜索首页 ->Search Page (small header)==>搜索页面(二级标题) ->Interactive Search Page==>交互搜索页面 "searchresult" (a default custom page name for search results)=="搜索结果" (搜索结果页面名称) "Change Search Page"=="改变搜索页" "Set to Default Values"=="设为默认值" @@ -865,57 +862,64 @@ Replace the word "MySearch" with your own message==用你想显示的信息替 Search Page<==搜索页< >Search Result Page Layout Configuration<==>搜索结果页面布局配置< Below is a generic template of the search result page. Mark the check boxes for features you would like to be displayed.==以下是搜索结果页面的通用模板.选中你希望显示的功能复选框. -To change colors and styles use the ==要改变颜色和样式使用 ->Appearance<==>外观< -menu for different skins==不同皮肤的菜单 +To change colors and styles use the Appearance menu for different skins.==要改变颜色和样式,使用外观菜单以改变皮肤。 Other portal settings can be adjusted in Generic Search Portal menu.==其他门户网站设置可以在通用搜索门户菜单中调整. >Page Template<==>页面模板< +>Toggle navigation<==>切换导航< +>Log in<==>登录< +>userName<==>用户名< +>Search Interfaces<==>搜索界面< +> Administration »<==> 管理 »< +>Tag<==>标签< +>Topics<==>主题< +>Cloud<==>云< +>Location<==>位置< +show search results on map==在地图上显示搜索结果 +Sorted by descending counts==按计数递减排序 +Sorted by ascending counts==按计数递增排序 +Sorted by descending labels==按降序标签排序 +Sorted by ascending labels==按升序标签排序 +>Sort by==>排序 +>Descending counts<==>降序计数< +>Ascending counts<==>升序计数< +>Descending labels<==>降序标签< +>Ascending labels<==>升序标签< +>Vocabulary <==>词汇< +>search<==>搜索< >Text<==>文本< >Images<==>图片< >Audio<==>音频< >Video<==>视频< >Applications<==>应用< >more options<==>更多选项< ->Tag<==>标签< ->Topics<==>主题< ->Cloud<==>云< ->Protocol<==>协议< ->Filetype<==>文件类型< ->Wiki Name Space<==>百科名称空间< ->Language<==>语言< ->Author<==>作者< ->Vocabulary<==>词汇< ->Provider<==>提供商< ->Collection<==>集合< +> Date Navigation<==> 日期导航< +Maximum range (in days)==最大范围 (按照天算) +Maximum days number in the histogram. Beware that a large value may trigger high CPU loads both on the server and on the browser with large result sets.==直方图中的最大天数. 请注意, 较大的值可能会在服务器和具有大结果集的浏览器上触发高CPU负载. +Show websites favicon==显示网站图标 +Not showing websites favicon can help you save some CPU time and network bandwidth.==不显示网站图标可以帮助您节省一些CPU时间和网络带宽。 >Title of Result<==>结果标题< Description and text snippet of the search result==搜索结果的描述和文本片段 +>Tags<==>标签< +>keyword<==>关键词< +>subject<==>主题< +>keyword2<==>关键词2< +>keyword3<==>关键词3< +Max. tags initially displayed==初始显示的最大标签数 +(remaining can then be expanded)==(剩下的可以扩展) 42 kbyte<==42kb< >Metadata<==>元数据< >Parser<==>解析器< +>Citation<==>引用< +>Pictures<==>图片< >Cache<==>缓存< -== -"Date"=="日期" -"Size"=="大小" -"Browse index"=="浏览索引" -For this option URL proxy must be enabled==对于这个选项,必须启用URL代理 -max. items==最大条目数 -"Save Settings"=="保存设置" -"Set Default Values"=="设置为默认值" -"Top navigation bar"=="顶部导航栏" ->Location<==>位置< -show search results on map==在地图上显示搜索结果 -Date Navigation==日期导航 -Maximum range (in days)==最大范围 (按照天算) -Maximum days number in the histogram. Beware that a large value may trigger high CPU loads both on the server and on the browser with large result sets.==直方图中的最大天数. 请注意, 较大的值可能会在服务器和具有大结果集的浏览器上触发高CPU负载. -keyword subject keyword2 keyword3==关键字 主题 关键字2 关键字3 -View via Proxy==通过代理查看 +>View via Proxy<==>通过代理查看< >JPG Snapshot<==>JPG快照< -"Raw ranking score value"=="原始排名得分值" -Ranking: 1.12195955E9==排名: 1.12195955E9 -"Delete navigator"=="删除导航器" -Add Navigators==添加导航器 -"Add navigator"=="添加导航器" ->append==>附加 +For this option URL proxy must be enabled.==对于这个选项,必须启用URL代理。 +menu: System Administration > Advanced Settings==菜单:系统管理>高级设置 +Ranking score value, mainly for debug/analysis purpose, configured in Debug/Analysis Settings==排名分数值,主要用于调试/分析目的,在调试/分析设置中配置 +>Add Navigators<==>添加导航器< +Save Settings==保存设置 +Set Default Values==重置默认值 #----------------------------- #File: ConfigUpdate_p.html @@ -1020,10 +1024,10 @@ Duration==持续时间 #--------------------------- Content Analysis==内容分析 These are document analysis attributes==这些是文档分析属性 -Double Content Detection==双重内容检测 +Double Content Detection==重复内容检测 Double-Content detection is done using a ranking on a 'unique'-Field, named 'fuzzy_signature_unique_b'.==双内容检测是使用名为'fuzzy_signature_unique_b'的'unique'字段上的排名完成的。 This is the minimum length of a word which shall be considered as element of the signature. Should be either 2 or 3.==这是一个应被视为签名的元素单词的最小长度。 应该是2或3。 -The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the less==quantRate是参与签名计算的单词数量的度量。 数字越高,越少 +The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the less==quantRate是参与签名计算的单词数量的度量。数字越高,越少 words are used for the signature==单词用于签名 For minTokenLen = 2 the quantRate value should not be below 0.24; for minTokenLen = 3 the quantRate value must be not below 0.5.==对于minTokenLen = 2,quantRate值不应低于0.24; 对于minTokenLen = 3,quantRate值必须不低于0.5。 "Re-Set to default"=="重置为默认" @@ -1253,7 +1257,7 @@ Remote crawl start points, finished:==远端爬虫开启点,已完成: #--------------------------- Crawl Profile Editor==爬取配置文件编辑器 >Crawl Profile Editor<==>爬取文件编辑< ->Crawler Steering<==>爬虫向导< +>Crawler Steering<==>爬虫控制< >Crawl Scheduler<==>爬取调度器< >Scheduled Crawls can be modified in this table<==>请在下表中修改已安排的爬取< Crawl profiles hold information about a crawl process that is currently ongoing.==爬取文件里保存有正在运行的爬取进程信息. @@ -1351,7 +1355,7 @@ Showing latest #[count]# lines from a stack of #[all]# entries.==显示栈中 #[ >Words==>单词 >Title==>标题 "delete"=="删除" ->Collection==>集合 +>Collection==>收集 Blacklist to use==使用的黑名单 "del & blacklist"=="删除并拉黑" on the 'Settings'-page in the 'Proxy and Administration Port' field.==在'设置'-页面的'代理和管理端口'字段的上。 @@ -1359,198 +1363,178 @@ on the 'Settings'-page in the 'Proxy and Administration Port' field.==在'设置 #File: CrawlStartExpert.html #--------------------------- -== -Expert Crawl Start==高级爬取设置 -Start Crawling Job:==开始爬取任务: -You can define URLs as start points for Web page crawling and start crawling here==你可以将指定地址作为爬取网页的起始点 -"Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links== "爬取中"意即YaCy会下载指定的网站, 并解析出网站中链接的所有内容 -This is repeated as long as specified under "Crawling Depth"==它将一直重复至到满足指定的"爬取深度" -A crawl can also be started using wget and the==爬取也可以将wget和 -for this web page==用于此网页 -#Crawl Job ->Crawl Job<==>爬取工作< -A Crawl Job consist of one or more start point, crawl limitations and document freshness rules==爬取作业由一个或多个起始点、爬取限制和文档新鲜度规则组成 -#Start Point +YaCy '#[clientname]#': Crawl Start==YaCy '#[clientname]#': 爬取开启 +Click on this API button to see a documentation of the POST request parameter for crawl starts.==单击此API按钮查看爬取启动的POST请求参数的文档。 +Expert Crawl Start==高级爬取开启 +Start Crawling Job:==开启爬取任务: +You can define URLs as start points for Web page crawling and start crawling here.==你可以在此指定网页爬取起始点的网址和开启爬取。 +"Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links.== "爬取中"意即YaCy会下载指定的网站, 并提取出其中的链接,接着下载链接中的全部内容。 +This is repeated as long as specified under "Crawling Depth".==它将一直重复上述步骤,直到满足指定的"爬取深度"。 +A crawl can also be started using wget and the post arguments for this web page.==也可以使用此网页的wget和post参数开启爬取。 +>Crawl Job<==>爬取任务< +A Crawl Job consist of one or more start point, crawl limitations and document freshness rules.==爬取任务由一个或多个起始点、爬取限制和文档更新规则构成。 >Start Point==>起始点 -Define the start-url(s) here.==在这儿确定起始地址. -You can submit more than one URL, each line one URL please.==你可以提交多个地址,请一行一个地址. -Each of these URLs are the root for a crawl start, existing start URLs are always re-loaded.==每个地址中都是爬取开始的根,已有的起始地址会被重新加载. -Other already visited URLs are sorted out as "double", if they are not allowed using the re-crawl option.==对已经访问过的地址,如果它们不允许被重新爬取,则被标记为'重复'. -One Start URL or a list of URLs:==一个起始地址或地址列表: -(must start with==(头部必须有 ->From Link-List of URL<==>来自地址的链接列表< -From Sitemap==来自站点地图 -From File (enter a path==来自文件(输入 -within your local file system)<==你本地文件系统的地址)< -#Crawler Filter +One Start URL or a list of URLs:
(must start with http:// https:// ftp:// smb:// file://)==起始网址或网址列表:
(必须以http:// https:// ftp:// smb:// file://开头) +Define the start-url(s) here. You can submit more than one URL, each line one URL please.==在此给定起始网址。你可以提交多个网址,请一个网址一行。 +Each of these URLs are the root for a crawl start, existing start URLs are always re-loaded.==这些网址中每个都是爬取开始的起点,已存在的起始网址总是会被重新加载。 +Other already visited URLs are sorted out as "double", if they are not allowed using the re-crawl option.==对其他已访问过的网址,如果基于重爬选项它们不被允许,则被标记为'重复'。 +>From Link-List of URL<==>来自网址的链接列表< +From Sitemap==来自网站地图 +From File (enter a path
within your local file system)==来自文件
(输入一个本地文件系统路径) >Crawler Filter==>爬虫过滤器 -These are limitations on the crawl stacker. The filters will be applied before a web page is loaded==这些是爬取堆栈器的限制.将在加载网页之前应用过滤器 -This defines how often the Crawler will follow links (of links..) embedded in websites.==此选项为爬虫跟踪网站嵌入链接的深度. -0 means that only the page you enter under "Starting Point" will be added==设置为0代表仅将"起始点" -to the index. 2-4 is good for normal indexing. Values over 8 are not useful, since a depth-8 crawl will==添加到索引.建议设置为2-4.由于设置为8会索引将近256亿个页面,所以不建议设置大于8的值, -index approximately 25.600.000.000 pages, maybe this is the whole WWW.==这可能是整个互联网的内容. +These are limitations on the crawl stacker. The filters will be applied before a web page is loaded.==这些是爬取堆栈器的限制。这些过滤器将在网页加载前被应用。 >Crawling Depth<==>爬取深度< -also all linked non-parsable documents==还包括所有链接的不可解析文档 ->Unlimited crawl depth for URLs matching with<==>不限爬取深度,对这些匹配的网址< ->Maximum Pages per Domain<==>每个域名最大页面数< -Use:==使用: -Page-Count==页面数 -You can limit the maximum number of pages that are fetched and indexed from a single domain with this option.==使用此选项,你可以限制将从单个域名中爬取和索引的页面数. -You can combine this limitation with the 'Auto-Dom-Filter', so that the limit is applied to all the domains within==你可以将此设置与'Auto-Dom-Filter'结合起来, 以限制给定深度中所有域名. -the given depth. Domains outside the given depth are then sorted-out anyway.==超出深度范围的域名会被自动忽略. ->misc. Constraints<==>其余约束< -A questionmark is usually a hint for a dynamic page.==动态页面常用问号标记. -URLs pointing to dynamic content should usually not be crawled.==通常不会爬取指向动态页面的地址. -However, there are sometimes web pages with static content that==然而,也有些含有静态内容的页面用问号标记. -is accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops.==如果你不确定,不要选中此项以防爬取时陷入死循环. -Accept URLs with query-part ('?')==接受具有查询格式('?')的地址 -Obey html-robots-noindex:==遵守html-robots-noindex: -Obey html-robots-nofollow:==遵守html-robots-nofollow: +This defines how often the Crawler will follow links (of links..) embedded in websites.==此选项决定了爬虫将跟随嵌入网址中链接的深度。 +0 means that only the page you enter under "Starting Point" will be added==0代表仅将"起始点"网址添加到索引。 +to the index. 2-4 is good for normal indexing. Values over 8 are not useful, since a depth-8 crawl will==2-4是常规索引用的值。超过8的值没有用,因为深度为8的爬取将 +index approximately 25.600.000.000 pages, maybe this is the whole WWW.==索引接近256亿个网页,这可能是整个互联网的内容。 +also all linked non-parsable documents==包括全部链接中不可解析的文档 +>Unlimited crawl depth for URLs matching with<==>对这些匹配的网址不不限制爬取深度< +>Maximum Pages per Domain<==>每个域名下最大网页数< +You can limit the maximum number of pages that are fetched and indexed from a single domain with this option.==使用此选项,你可以限制单个域名下爬取和索引的页面数。 +You can combine this limitation with the 'Auto-Dom-Filter', so that the limit is applied to all the domains within==你可以将此设置与'Auto-Dom-Filter'结合起来, 以限制给定深度中所有域名。 +the given depth. Domains outside the given depth are then sorted-out anyway.==超出深度范围的域名会被自动忽略。 +>Use<==>使用< +Page-Count<==页面数< +>misc. Constraints<==>其它限制< +A questionmark is usually a hint for a dynamic page. URLs pointing to dynamic content should usually not be crawled.==问号标记常用作动态网页的提示。指向动态内容的地址通常不应该被爬取。 +However, there are sometimes web pages with static content that==然而,也有些含有静态网页地址也包含问号标记。 +is accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops.==如果你不确定,不要勾选此项以防爬取陷入循环。 +Following frames is NOT done by Gxxg1e, but we do by default to have a richer content. 'nofollow' in robots metadata can be overridden; this does not affect obeying of the robots.txt which is never ignored.==以下框架不是Gxxg1e制作的,但我们默认会制作更丰富的内容。robots元数据中的nofollow可被否决;这并不影响对无法忽视的robots.txt的遵守。 +Accept URLs with query-part ('?'): ==接受包含问号标记('?')的地址: +Obey html-robots-noindex:==遵守html-robots-noindex: +Obey html-robots-nofollow:==遵守html-robots-nofollow: Media Type detection==媒体类型探测 +Not loading URLs with unsupported file extension is faster but less accurate.==不加载包含不受支持文件扩展名的网址速度更快,但准确性更低。 +Indeed, for some web resources the actual Media Type is not consistent with the URL file extension. Here are some examples:==实际上,对于某些网络资源,实际的媒体类型与网址中文件扩展名不一致。以下是一些例子: +: the .de extension is unknown, but the actual Media Type of this page is text/html==: 这个.de扩展名未知,但此页面的实际媒体类型为text/html +: the .com extension is not supported (executable file format), but the actual Media Type of this page is text/html==: 这个.com扩展名不受支持(可执行文件格式),但此页面的实际媒体类型为text/html +: the .png extension is a supported image format, but the actual Media Type of this page is text/html==: 这个.png扩展名是一种受支持的图像格式,但该页面的实际媒体类型是text/html Do not load URLs with an unsupported file extension==不加载具有不支持文件拓展名的地址 Always cross check file extension against Content-Type header==始终针对Content-Type标头交叉检查文件扩展名 >Load Filter on URLs<==>对地址加载过滤器< +The filter is a regular expression.==这个过滤器是一个正则表达式。 +Example: to allow only urls that contain the word 'science', set the must-match filter to '.*science.*'. ==示例:要仅允许包含单词“science”的网址,请将“必须匹配”筛选器设置为'.*science.*'。 +You can also use an automatic domain-restriction to fully crawl a single domain.==你还可以使用自动域名限制来完全爬取单个域名。 +Attention: you can test the functionality of your regular expressions using the Regular Expression Tester within YaCy.==注意:你可以使用YaCy中的正则表达式测试仪测试正则表达式的功能。 > must-match<==>必须匹配< -The filter is a <==这个过滤器是一个< ->regular expression<==>正则表达式< -Example: to allow only urls that contain the word 'science', set the must-match filter to '.*science.*'.==列如:只允许包含'science'的地址,就在'必须匹配过滤器'中输入'.*science.*'. -You can also use an automatic domain-restriction to fully crawl a single domain.==你也可以使用主动域名限制来完全爬取单个域名. -Attention: you can test the functionality of your regular expressions using the==注意:你可测试你的正则表达式功能使用 ->Regular Expression Tester<==>正则表达式测试器< -within YaCy.==在YaCy中. Restrict to start domain==限制起始域 Restrict to sub-path==限制子路经 Use filter==使用过滤器 (must not be empty)==(不能为空) > must-not-match<==>必须排除< +>Load Filter on URL origin of links<==>在链接的地址上加载筛选器< +The filter is a regular expression==这个过滤器是一个正则表达式 +Example: to allow loading only links from pages on example.org domain, set the must-match filter to '.*example.org.*'.==示例:为只允许加载域名example.org网页中链接,将“必须匹配”筛选器设置为'.*example.org.*'。 >Load Filter on IPs<==>对IP加载过滤器< >Must-Match List for Country Codes<==>国家代码必须匹配列表< -Crawls can be restricted to specific countries.==可以限制只在某个具体国家爬取. -This uses the country code that can be computed from==这会使用国家代码, 它来自 -the IP of the server that hosts the page.==该页面所在主机的IP. -The filter is not a regular expressions but a list of country codes,==这个过滤器不是正则表达式,而是 -separated by comma.==由逗号隔开的国家代码列表. +Crawls can be restricted to specific countries. This uses the country code that can be computed from==爬取可以限制在特定的国家。它使用的国家代码可以从存放网页的服务器的IP计算得出。 +the IP of the server that hosts the page. The filter is not a regular expressions but a list of country codes, separated by comma.==过滤器不是正则表达式,而是国家代码列表,用逗号分隔。 >no country code restriction<==>没有国家代码限制< -#Document Filter +>Use filter  ==>使用过滤器   >Document Filter==>文档过滤器 -These are limitations on index feeder.==这些是索引进料器的限制. -The filters will be applied after a web page was loaded.==加载网页后将应用过滤器. -that must not match with the URLs to allow that the content of the url is indexed.==它必须排除这些地址,从而允许地址中的内容被索引. +These are limitations on index feeder. The filters will be applied after a web page was loaded.==这些是对索引供给器的限制。加载网页后过滤器才会被应用。 >Filter on URLs<==>地址过滤器< ->Filter on Content of Document<==>文档内容过滤器< ->(all visible text, including camel-case-tokenized url and title)<==>(所有可见文本,包括camel-case-tokenized的网址和标题)< ->Filter on Document Media Type (aka MIME type)<==>文档媒体类型过滤器(又称MIME类型)< ->Solr query filter on any active <==>Solr查询过滤器对任何有效的< ->indexed<==>索引的< -> field(s)<==>域< -#Content Filter +The filter is a regular expression==这个过滤器是一个正则表达式 +that must not match with the URLs to allow that the content of the url is indexed.==匹配那些必须排除的网址,以允许对剩下网址的内容进行索引。 +Filter on Content of Document
(all visible text, including camel-case-tokenized url and title)==文档内容过滤器
(所有可见文本,包括驼峰大小写标记的网址和标题) +Filter on Document Media Type (aka MIME type)==文档媒体类型过滤器(又名MIME类型) +that must match with the document Media Type (also known as MIME Type) to allow the URL to be indexed. ==对那些有必须匹配文档媒体类型(也称为MIME类型)的网址进行索引。 +Standard Media Types are described at the IANA registry.==IANA注册表中描述了标准媒体类型。 +Solr query filter on any active indexed field(s)==任何激活索引字段上的Solr查询过滤器 +Each parsed document is checked against the given Solr query before being added to the index.==在添加到索引之前,将根据给定的Solr查询检查每个已解析的文档。 +The query must be written in respect to the standard Solr query syntax.==必须按照标准Solr查询语法编写查询。 +The embedded local Solr index must be connected to use this kind of filter.==要使用这种过滤器,必须连接嵌入式本地Solr索引。 +You can configure this with the Index Sources & targets page.==你可以使用索引源目标页面对此进行配置。 >Content Filter==>内容过滤器 -These are limitations on parts of a document.==这些是文档部分的限制. -The filter will be applied after a web page was loaded.==加载网页后将应用过滤器. +These are limitations on parts of a document. The filter will be applied after a web page was loaded.==这些是文档部分的限制.加载网页后将应用过滤器. >Filter div or nav class names<==>div或nav类名过滤器< ->set of CSS class names<==>CSS类名集合< -#comma-separated list of
or