YaCy Network Access
Server Access Grid
This images shows incoming connections to your YaCy peer and outgoing connections from your peer to other peers and web servers
Access Tracker
Server Access Overview
This is a list of requests to the local http server within the last hour.
Access Times
Server Access Details
Local Search Log
Top Search Words (last 7 Days)
Local Search Host Tracker
Remote Search Log
Remote Search Host Tracker
This is a list of searches that had been requested from this' peer search interface
This is a list of searches that had been requested from remote peer search interface
This is a list of requests (max. 1000) to the local http server within the last hour.
URL Proxy Settings
With this settings you can activate or deactivate URL proxy.
Service call: , where parameter is the url of an external web page.
Show search results via URL proxy:
Enables or disables URL proxy for all search results. If enabled, all search results will be tunneled through URL proxy.
Restrict URL proxy use:
Define client filter. Default: 
URL substitution:
Define URL substitution rules which allow navigating in proxy environment. Possible values: all, domainlist. Default: domainlist. Alternatively you may add this javascript to your browser favorites/short-cuts, which will reload the current browser address via the YaCy proxy servlet.
or right-click this link and add to favorites: Autocrawler
Autocrawler automatically selects and adds tasks to the local crawl queue.
This will work best when there are already quite a few domains in the index.
Autocralwer Configuration
You need to restart for some settings to be applied
Enable Autocrawler:
Deep crawl every:
Warning: if this is bigger than "Rows to fetch" only shallow crawls will run.
Rows to fetch at once:
Recrawl only older than # days:
Get hosts by query:
Can be any valid Solr query.
Shallow crawl depth (0 to 2):
Deep crawl depth (1 to 5):
Index text:
Index media:
Blacklist Cleaner
Here you can remove or edit illegal or double blacklist-entries.
Allow regular expressions in host part of blacklist entries.
The blacklist-cleaner only works for the following blacklist-engines up to now:
Illegal Entries in for
Deleted entries
Altered entries!
Two wildcards in host-part
Either subdomain or wildcard
Path is invalid Regex
Wildcard not on begin or end
Host contains illegal chars
Double
No Blacklist selected
Blacklist Import
Used Blacklist engine:
Import blacklist items from...
other YaCy peers:
URL:
plain text file:
XML file:
Upload a regular text file which contains one blacklist entry per line.
Upload an XML file which contains one or more blacklists.
Export blacklist items to...
Here you can export a blacklist as an XML file. This file will contain additional information about which cases a blacklist is activated for.
Here you can export a blacklist as a regular text file with one blacklist entry per line.
This file will not contain any additional information
Blacklist Test
Used Blacklist engine: The tested URL was
It is blocked for the following cases:
Search
Surftips
Blacklist Administration
This function provides an URL filter to the proxy; any blacklisted URL is blocked from being loaded. You can define several blacklists and activate them separately.
You may also provide your blacklist to other peers by sharing them; in return you may collect blacklist entries from other peers. Active list:
No blacklist selected
Select list to edit:
not shared:
shared
Settings for this list
Share/don't share this list
Delete this list
Edit list
These are the domain name/path patterns in
Blacklist Pattern
Add new pattern:
Add URL pattern
The right '*', after the '/', can be replaced by a regular expression
domain.net/fullpath
domain.net/*
*.domain.net/*
*.sub.domain.net/*
(slow)
Activate this list for
Show entries:
Entries per page:
Edit existing pattern(s):
by
Comments
Blog-Home
Author:
Subject:
You can use Yacy-Wiki Code here.
Comments:
deactivated
activated
moderated
Preview
No changes have been submitted so far!
Access denied
To edit or create blog-entries you need to be logged in as Admin or User who has Blog rights.
Are you sure that you want to delete
Confirm deletion
Yes, delete it.
No, leave it.
Import was successful!
Import failed, maybe the supplied file was no valid blog-backup?
Please select the XML-file you want to import:
Text:
by
Comments
Login
Blog-Home
delete
allow
Author:
Subject:
You can use Yacy-Wiki Code here.
YaCy: Bookmarks
The bookmarks list can also be retrieved as RSS feed. This can also be done when you select a specific tag.
Click the API icon to load the RSS from the current selection.
To see a list of all APIs, please visit the API wiki page
Bookmarks
Bookmarks (List 
Bookmarks
Add Bookmark
Import Bookmarks
Import XML Bookmarks
Import HTML Bookmarks
Default Tags:
imported
Title:
Description:
Folder (/folder/subfolder):
Tags (comma separated):
Public:
yes
no
Bookmark is a newsfeed
File:
import as Public
private bookmark
public bookmark
Tagged with
'Confirm deletion'
Edit
Delete
Folders
Bookmark Folder
Tags
Bookmark List
previous page
next page
All
Show
Bookmarks per page.
start autosearch of new bookmarks
This starts a search of new or modified bookmarks since startup
in folder "search" with "query=<original_search_term>"
Every peer online will be ask for results.
Image Collage
Private Queue
Public Queue
User Accounts
User Administration
User created:
User changed:
Generic error.
Passwords do not match.
Username too short. Username must be >= 4 Characters.
Username already used (not allowed). No password is set for the administration account.
Please define a password for the admin account. Admin Account
Access from localhost without account
Access to your peer from your own computer (localhost access) is granted with administrator rights. No need to configure an administration account.
Access only with qualified account
This is required if you want a remote access to your peer, but it also hardens access controls on administration operations of your peer.
Peer User:
New Peer Password:
Repeat Peer Password: Access Rules
Protection of all pages: if set to on, access to all pages need authorization; if off, only pages with "_p" extension are protected.
Set Access Rules
Select user
New user
Edit User
Delete User
Edit current user:
Username
Password
Repeat password
First name
Last name
Address
Rights
Timelimit
Time used
Save User
This setting is convenient but less secure than using a qualified admin account.
Please use with care, notably when you browse untrusted and potentially malicious websites while running your YaCy peer on the same computer.
Appearance and Integration
You can change the appearance of the YaCy interface with skins.
The selected skin and language also affects the appearance of the search page.
If you create a search portal with YaCy then you can change the appearance of the search page here.
Skin Selection
Select one of the default skins, download new skins, or create your own skin.
Current skin
Available Skins
Skin Color Definition
The generic skin 'generic_pd' can be configured here with custom colors:
Background
Text
Legend
Table Header
Table Item
Table Item 2
Table Bottom
Border Line
Sign 'bad'
Sign 'good'
Sign 'other'
Search Headline
Search URL
Skin Download
Skins can be installed from download locations
Install new skin from URL
Use this skin
Make sure that you only download data from trustworthy sources. The new Skin file might overwrite existing data if a file of the same name exists already.
Unable to get URL:
Error saving the skin. The new Skin filemight overwrite existing data if a file of the same name exists already.>Unable to get URL:Error saving the skin.Access ConfigurationBasic ConfigurationYour port has changed. Please wait 10 seconds.Your browser will be redirected to the new <a href="http://#[host]#:#[port]#/ConfigBasic.html">location</a> in 5 seconds.The peer port was changed successfully.Your YaCy Peer needs some basic information to operate properlySelect a language for the interfaceDeutschFrançais汉语/漢語РусскийУкраїнськаहिन्दी日本語Use Case: what do you want to do with YaCy:Community-based web searchJoin and support the global network 'freeworld', search the web with an uncensored user-owned search networkSearch portal for your own web pagesYour YaCy installation behaves independently from other peers and you define your own web index by starting your own web crawl. Hypertext Cache Configuration
The HTCache stores content retrieved by the HTTP and FTP protocol. Documents from smb:// and file:// locations are not cached.
The cache is a rotating cache: if it is full, then the oldest entries are deleted and new one can fill the space. HTCache Configuration
The path where the cache is stored
The current size of the cache
The maximum size of the cache
Cleanup
Cache Deletion
Delete HTTP & FTP Cache
Delete robots.txt Cache Heuristics Configuration
A heuristic is an 'experience-based technique that help in problem solving, learning and discovery' (wikipedia).
The search heuristics that can be switched on here are techniques that help the discovery of possible search results based on link guessing, in-search crawling and requests to other search engines.
When a search heuristic is used, the resulting links are not used directly as search result but the loaded pages are indexed and stored like other content.
This ensures that blacklists can be used and that the searched word actually appears on the page that was discovered by the heuristic.
The success of heuristics are marked with an image
heuristic:<name>(new link)
below the favicon left from the search result entry:
The search result was discovered by a heuristic, but the link was already known by YaCy
The search result was discovered by a heuristic, not previously known by YaCy
'site'-operator: instant shallow crawl
When a search is made using a 'site'-operator (like: 'download site:yacy.net') then the host of the site-operator is instantly crawled with a host-restricted depth-1 crawl.
That means: right after the search request the portal page of the host is loaded and every page that is linked on this page that points to a page on the same host.
Because this 'instant crawl' must obey the robots.txt and a minimum access time for two consecutive pages, this heuristic is rather slow, but may discover all wanted search results using a second search (after a small pause of some seconds).
search-result: shallow crawl on all displayed search results
When a search is made then all displayed result links are crawled with a depth-1 crawl.
This means: right after the search request every page is loaded and every page that is linked on this page.
If you check 'add as global crawl job' the pages to be crawled are added to the global crawl queue (remote peers can pickup pages to be crawled).
Default is to add the links to the local crawl queue (your peer crawls the linked pages).
add as global crawl job
opensearch load external search result list from active systems below
When using this heuristic, then every new search request line is used for a call to listed opensearch systems.
20 results are taken from remote system and loaded simultanously, parsed and indexed immediately.
To find out more about OpenSearch see
Available/Active Opensearch System
Active
Title
Comment
Url (format opensearch Url template syntax
delete
new
With the button "discover from index" you can search within the metadata of your local index (Web Structure Index) to find systems which support the Opensearch specification.
The task is started in the background. It may take some minutes before new entries appear (after refreshing the page).
Alternatively you may copy & paste a example config file located in defaults/heuristicopensearch.conf to the DATA/SETTINGS directory.
For the discover function the web graph option of the web structure index and the fields target_rel_s, target_protocol_s, target_urlstub_s have to be switched on in the webgraph Solr schema Language selection
You can change the language of the YaCy-webinterface with translation files.
Current language
Author(s) (chronological)
Send additions to maintainer
Available Languages
Install new language from URL
Use this language
Unable to get URL:
Error saving the language file.
Make sure that you only download data from trustworthy sources. The new language file might overwrite existing data if a file of the same name exists already.
Download Language File
Supported formats are the internal language file (extension .lng) or XLIFF (extension .xlf) format.
Simple Editor
to add untranslated text Network Configuration
No changes were made!
Accepted Changes
Inapplicable Setting Combination
For P2P operation, at least DHT distribution or DHT receive (or both) must be set. You have thus defined a Robinson configuration
Global Search in P2P configuration is only allowed, if index receive is switched on. You have a P2P configuration, but are not allowed to search other peers. For Robinson Mode, index distribution and receive is switched off
Network and Domain Specification
YaCy can operate a computing grid of YaCy peers or as a stand-alone node.
To control that all participants within a web indexing domain have access to the same domain,
this network definition must be equal to all members of the same YaCy network.
Network Definition
Remote Network Definition The DHT-rules do not work without this function>rejectaccept transmitted URLs that match your blacklist>allowdeny remote search>Robinson ModeIf your peer runs in 'Robinson Mode' you run YaCy as a search engine for your own search portal without data exchange to other peersThere is no index receive and no index distribution between your peer and any other peer>Private PeerYour search engine will not contact any other peer, and will reject every request>Public ClusterYour peer is part of a public cluster within the YaCy networkIndex data is not distributed, but remote crawl requests are distributed and acceptedSearch requests are spread over all peers of the cluster, and answered from all peers of the clusterList of .yacy or .yacyh - domains of the cluster: (comma-separated)>Public PeerYou are visible to other peers and contact them to distribute your presenceYour peer does not accept any outside index data, but responds on all remote search requests>Peer TagsWhen you allow access from the YaCy network, your data is recognized using keywordsPlease describe your search portal with some keywords (comma-separated)If you leave the field empty, no peer asks your peer. If you fill in a '*', your peer is always asked."Save"Network DefinitionIn case of Robinson-clustering there can be acceptance of remote crawl requests from peers of that clusterParser ConfigurationContent Parser SettingsWith this settings you can activate or deactivate parsing of additional content-types based on their MIME-types.For a detailed description of the various MIME-types take a look atIf you want to test a specific parser you can do so using the>File Viewer<>Extension<>Mime-Type<"Submit"Integration of a Search PortalIf you like to integrate YaCy as portal for your web pages, you may want to change icons and messages on the search page.The search page may be customized.You can change the 'corporate identity'-images, the greeting lineand a link to a home page that is reached when the 'corporate identity'-images are clicked.To change also colours and styles use the <a href="ConfigAppearance_p.html">Appearance Servlet</a> for different skins and languages.Greeting Line<URL of Home Page<URL of a Small Corporate Image<URL of a Large Corporate Image<Enable Search for Everyone?Search is available for everyoneOnly the administrator is allowed to searchSnippet Fetch Strategy & Link VerificationSpeed up search results with this option! (use CACHEONLY or FALSE to switch off verification)NOCACHE: no use of web cache, load all snippets onlineIFFRESH: use the cache if the cache exists and is fresh otherwise load onlineIFEXIST: use the cache if the cache exist or load onlineIf verification fails, delete index referenceCACHEONLY: never go online, use all content from cache. If no cache entry exist, consider content nevertheless as available and show result without snippetFALSE: no link verification and not snippet generation: all search results are valid without verificationGreedy Learning Modeload documents linked in search results, will be deactivated automatically when index sizeShow Navigation Bar on Search Page?Show Navigation Top-Menu no link to YaCy Menu (admin must navigate to /Status.html manually)Show Advanced Search Options on Search Page?Show Advanced Search Options on index.html do not show Advanced SearchDefault Pop-Up Page<>Status Page>Search Front Page>Search Page (small header)>Interactive Search PageDefault maximum number of results per pageDefault index.html Page (by forwarder)Target for Click on Search Results"_blank" (new window)"_self" (same window)"_parent" (the parent frame of a frameset)"_top" (top of all frames)"searchresult" (a default custom page name for search results)Special Target as Exception for an URL-PatternPattern:<>Exclude Hosts<List of hosts that shall be excluded from search results by default but can be included using the site:<host> operator:'About' Column<br/>(shown in a column alongside<br/>with the search result page)(Headline)(Content)"Change Search Page""Set to Default Values"You have to <a href="ConfigAccounts_p.html">set a remote user/password</a> to change this options.The search page can be integrated in your own web pages with an iframe. Simply use the following code:This would look like:For a search page with a small header, use this code:A third option is the interactive search. Use this code:You have set a remote user/passwordto change this options.Your Personal ProfileYou can create a personal profile here, which can be seen by other YaCy-membersor <a href="ViewProfile.html?hash=localhash">in the public</a> using a <a href="ViewProfile.rdf?hash=localhash">FOAF RDF file</a>.>Name<Nick NameHomepage (appears on every <a href="Supporter.html">Supporter Page</a> as long as your peer is online)eMailComment"Save"You can use <> here.Advanced ConfigHere are all configuration options from YaCy.You can change anything, but some options need a restart, and some options can crash YaCy, if wrong values are used.For explanation please look into defaults/yacy.init"Save""Clear"Exclude Web-SpidersHere you can set up a robots.txt for all webcrawlers that try to access the webinterface of your peer.is a voluntary agreement most search-engines (including YaCy) follow.It disallows crawlers to access webpages or even entire domains.Deny access toEntire PeerStatus pageNetwork pagesSurftipsNews pagesBlogPublic bookmarksHome PageFile ShareImpressum"Save restrictions"WikiIntegration of a Search BoxWe give information how to integrate a search box on any web page thatSimply use the following code: MySearch"Search"This would look like:This does not use a style sheet file to make the integration into another web page with a different style sheet easier.You would need to change the following items:Replace the given colors #eeeeee (box background) and #cccccc (box border)Replace the word "MySearch" with your own messagecalls the normal YaCy search window.Search Page<>Search Result Page Layout Configuration<Below is a generic template of the search result page. Mark the check boxes for features you would like to be displayed.To change colors and styles use the >Appearance< menu for different skins.Other portal settings can be adjusted in <a href="ConfigPortal_p.html">Generic Search Portal</a> menu.>Page Template<>Text<>Images<>Audio<>Video<>Applications<>more options<>Tag<>Topics<>Cloud<>Protocol<>Filetype<>Wiki Name Space<>Language<>Author<>Vocabulary<>Provider<>Collection<>Title of Result<Description and text snippet of the search result42 kbyte<>Metadata<>Parser<>Citation<>Pictures<>Cache<<html lang="en">"Date""Size""Browse index"For this option URL proxy must be enabled.max. items"Save Settings""Set Default Values""Top navigation bar">Location<show search results on mapDate NavigationMaximum range (in days)Maximum days number in the histogram. Beware that a large value may trigger high CPU loads both on the server and on the browser with large result sets.keyword subject keyword2 keyword3View via Proxy>JPG Snapshot<"Raw ranking score value"Ranking: 1.12195955E9"Delete navigator"Add Navigators"Add navigator">appendhttp://url-of-the-search-result.net>System Update<Manual System UpdateCurrent installed ReleaseAvailable Releases>changelog<> and <> RSS feed<(unsigned)(signed)"Download Release""Check for new Release"Downloaded ReleasesNo downloaded releases available for deployment.no automated installation on development environments"Install Release""Delete Release"Automatic Updatecheck for new releases, download if available and restart with downloaded release"Check + Download + Install Release Now"Download of release #[downloadedRelease]# finished. Restart Initiated.No more recent release found.Release will be installed. Please wait.You installed YaCy with a package manager.To update YaCy, use the package manager:Omitting update because this is a development environment.Omitting update because download of release #[downloadedRelease]# failed.Automated System Updatemanual updateno automatic look-up, updates can be made manually using this interface (see options above)automatic updateadd the following line toupdates are made within fixed cycles:Time between lookuphoursRelease blacklistregex on release number stringsRelease typeonly main releasesany release including developer releasesSigned autoupdate:only accept signed files"Submit"Accepted Changes.System Update StatisticsLast System LookupneverLast Release DownloadLast DeployServer Connection TrackingIncoming ConnectionsShowing #[numActiveRunning]# active connections from a max. of #[numMax]# allowed incoming connections.Protocol</td>DurationUp-BytesSource IP[:Port]Dest. IP[:Port]Command</td>Outgoing ConnectionsShowing #[clientActive]# pooled outgoing connections used as:Connection TrackingContent AnalysisThese are document analysis attributes.Double Content DetectionDouble-Content detection is done using a ranking on a 'unique'-Field, named 'fuzzy_signature_unique_b'.This is the minimum length of a word which shall be considered as element of the signature. Should be either 2 or 3.The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the lesswords are used for the signature.For minTokenLen = 2 the quantRate value should not be below 0.24; for minTokenLen = 3 the quantRate value must be not below 0.5."Set""Re-Set to default"Content Control<Peer Content Control URL FilterWith this settings you can activate or deactivate content control on this peer.Use content control filtering:>Enabled<Enables or disables content control.Use this table to create filter:Define a table. Default:Content Control SMW Import SettingsWith this settings you can define the content control import settings. You can define aSemantic Media Wiki with the appropriate extensions.SMW import to content control list:Enable or disable constant background synchronization of content control list from SMW (Semantic Mediawiki). Requires restart!SMW import base URL:Define base URL for SMW special page "Ask". Example: SMW import target table:Define import target table. Default: contentcontrolPurge content control list on initial sync:Purge content control list on initial synchronisation after startup."Submit"Content Integration: Retrieval from phpBB3 DatabasesIt is possible to extract texts directly from mySQL and postgreSQL databases.This interface gives you access to the phpBB3 forums software content.If you read from an imported database, here are some hints to get around problems when importing dumps in phpMyAdmin:before importing large database dumps, setthe following Line in phpmyadmin/config.inc.php and place your dump file in /tmp (Otherwise it is not possible to upload files larger than 2MB)deselect the partial import flagWhen an export is started, surrogate files are generated into DATA/SURROGATE/in which are automatically fetched by an indexer thread.All indexed surrogate files are then moved to DATA/SURROGATE/out and can be re-cycled when an index is deleted.The URL stublike https://searchlab.euthis must be the path right in front of '/viewtopic.php?'Type> of database<use either 'mysql' or 'pgsql'<b>Host</b> of the database<b>Port</b> of database service<br />(usually 3306 for mySQL)<b>Name of the database</b> on the host<b>Table prefix string</b> for table names<b>User</b> that can access the database<b>Password</b> for the account of that user given above<b>Posts per file</b><br />in exported surrogatesCheck database connectionExport Content to SurrogatesImport a database dumpImport DumpPosts in databasefirst entrylast entryInfo failed:Export successful! Wrote #[files]# files in DATA/SURROGATES/inExport failed:Import successful!Import failed:Each extraction is specific to the data that is hosted in the database.in phpmyadmin/config.inc.php and place your dump file in /tmp (Otherwise it is not possible to upload files larger than 2MB)Hostof database serviceusually 3306 for mySQLName of the databaseon the host> of the database<Table prefix stringfor table namesUserthat can access the databasePasswordfor the account of that user given abovePosts per filein exported surrogatesIncoming Cookies MonitorCookie Monitor: Incoming CookiesThis is a list of Cookies that a web server has sent to clients of the YaCy Proxy:Showing #[num]# entries from a total of #[total]# Cookies.Sending HostDate</td>Receiving Client>Cookie<"Enable Cookie Monitoring""Disable Cookie Monitoring"Outgoing Cookies MonitorCookie Monitor: Outgoing CookiesThis is a list of cookies that browsers using the YaCy proxy sent to webservers:Showing #[num]# entries from a total of #[total]# Cookies.Receiving HostDate</td>Sending Client>Cookie<"Enable Cookie Monitoring""Disable Cookie Monitoring"Cookie - Test PageHere is a cookie test page.Just clean itName:Value:Dear server, set this cookie for me!Cookies at this browser:Cookies coming to server:Cookies server sent:YaCy is a GPL'ed projectwith the target of implementing a P2P-based global search engine.Architecture (C) byCrawl CheckThis pages gives you an analysis about the possible success for a web crawl on given addresses.List of possible crawl start URLs"Check given urls">Analysis<>Access<>Robots<>Crawl-Delay<>Sitemap<Crawl Profile Editor>Crawler Steering<>Crawl Scheduler<>Scheduled Crawls can be modified in this table<Crawl profiles hold information about a crawl process that is currently ongoing.Crawl Profile ListCrawl ThreadStatus>Depth</strong>Must MatchMust Not MatchDomain Counter ContentMax Page Per Domain</strong>AcceptFill Proxy CacheLocal Text IndexingLocal Media IndexingRemote Indexingno::yesRunning"Terminate"Finished"Delete""Delete finished crawls"Select the profile to edit"Edit profile"An error occurred during editing the crawl profile:Edit Profile"Submit changes"Crawl Results<>Crawl Results Overview<These are monitoring pages for the different indexing queues.YaCy knows 5 different ways to acquire web indexes. The details of these processes (1-5) are described within the submenu's listedabove which also will show you a table with indexing results so far. The information in these tables is considered as private,so you need to log-in with your administration password.Case (6) is a monitor of the local receipt-generator, the opposed case of (1). It contains also an indexing result monitor but is not considered privatesince it shows crawl requests from other peers.Case (7) occurs if surrogate files are importedThe image above illustrates the data flow initiated by web index acquisition.Some processes occur double to document the complex index migration structure.(1) Results of Remote Crawl ReceiptsThis is the list of web pages that this peer initiated to crawl,but had been crawled by <em>other</em> peers.This is the 'mirror'-case of process (6).<em>Use Case:</em> You get entries here, if you start a local crawl on the '<a href="CrawlStartExpert.html">Advanced Crawler</a>' page and check the'Do Remote Indexing'-flag, and if you checked the 'Accept Remote Crawl Requests'-flag on the '<a href="RemoteCrawl_p.html">Remote Crawling</a>' page.Every page that a remote peer indexes upon this peer's request is reported back and can be monitored here.(2) Results for Result of Search QueriesThis index transfer was initiated by your peer by doing a search query.The index was crawled and contributed by other peers.<em>Use Case:</em> This list fills up if you do a search query on the 'Search Page'(3) Results for Index TransferThe url fetch was initiated and executed by other peers.These links here have been transmitted to you because your peer is the most appropriate for storage according tothe logic of the Global Distributed Hash Table.<em>Use Case:</em> This list may fill if you check the 'Index Receive'-flag on the 'Index Control' page(4) Results for Proxy IndexingThese web pages had been indexed as result of your proxy usage.No personal or protected page is indexedsuch pages are detected by Cookie-Use or POST-Parameters (either in URL or as HTTP protocol)and automatically excluded from indexing.<em>Use Case:</em> You must use YaCy as proxy to fill up this table.Set the proxy settings of your browser to the same port as given(5) Results for Local CrawlingThese web pages had been crawled by your own crawl task.<em>Use Case:</em> start a crawl by setting a crawl start point on the 'Index Create' page.(6) Results for Global CrawlingThese pages had been indexed by your peer, but the crawl was initiated by a remote peer.This is the 'mirror'-case of process (1).<em>Use Case:</em> This list may fill if you check the 'Accept Remote Crawl Requests'-flag on the '<a href="RemoteCrawl_p.html">Remote Crawling</a>' pageThe stack is empty.Statistics about #[domains]# domains in this stack:(7) Results from surrogates importThese records had been imported from surrogate files in DATA/SURROGATES/in<em>Use Case:</em> place files with dublin core metadata content into DATA/SURROGATES/in or use an index import method(i.e. <a href="IndexImportMediawiki_p.html">MediaWiki import</a>, <a href="IndexImportOAIPMH_p.html">OAI-PMH retrieval</a>)>Domain"delete all"Showing all #[all]# entries in this stack.Showing latest #[count]# lines from a stack of #[all]# entries."clear list">Executor>Modified>Words>Title"delete">CollectionBlacklist to use"del & blacklist"on the 'Settings'-page in the 'Proxy and Administration Port' field.<html lang="en">Expert Crawl StartStart Crawling Job:You can define URLs as start points for Web page crawling and start crawling here."Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links.This is repeated as long as specified under "Crawling Depth".A crawl can also be started using wget and the>post arguments<> for this web page.Click on this API button to see a documentation of the POST request parameter for crawl starts.>Crawl Job<A Crawl Job consist of one or more start point, crawl limitations and document freshness rules.>Start Point<One Start URL or a list of URLs:<br/>(must start with http:// https:// ftp:// smb:// file://)Define the start-url(s) here. You can submit more than one URL, each line one URL please.Each of these URLs are the root for a crawl start, existing start URLs are always re-loaded.>From Link-List of URL<From SitemapFrom File (enter a path<br/>within your local file system)Other already visited URLs are sorted out as "double", if they are not allowed using the re-crawl option.A web crawl performs a double-check on all links found in the internet against the internal database. If the same url is found again,then the url is treated as double when you check the 'no doubles' option. A url may be loaded again when it has reached a specific age,Use filterRestrict to start domain(s)Restrict to sub-path(s)Example: to allow only urls that contain the word 'science', set the must-match filter to '.*science.*'.You can also use an automatic domain-restriction to fully crawl a single domain.Attention: you can test the functionality of your regular expressions using the <a href="RegexTest.html">Regular Expression Tester</a> within YaCy</a>.You can limit the maximum number of pages that are fetched and indexed from a single domain with this option.You can combine this limitation with the 'Auto-Dom-Filter', so that the limit is applied to all the domains withinthe given depth. Domains outside the given depth are then sorted-out anyway.Document Cache<Store to Web CacheThis option is used by default for proxy prefetch, but is not needed for explicit crawling.A questionmark is usually a hint for a dynamic page. URLs pointing to dynamic content should usually not be crawled.However, there are sometimes web pages with static content thatis accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops.Accept URLs with query-part ('?'):Obey html-robots-noindex:Policy for usage of Web CacheThe caching policy states when to use the cache during crawling:no cacheif freshif existcache onlynever use the cache, all content from fresh internet source;use the cache if the cache exists and is fresh using the proxy-fresh rules;use the cache if the cache exist. Do no check freshness. Otherwise use online source;never go online, use all content from cache. If no cache exist, treat content as unavailable>Snapshot Creation<Max Depth for SnapshotsMultiple Snapshot Versionsreplace old snapshots with new oneadd new versions for each crawlSnapshots are xml metadata and pictures of web pages that can be created during crawling time.The xml data is stored in the same way as a Solr search result with one hit and the pictures will be stored as pdf into subdirectoriesof HTCACHE/snapshots/. From the pdfs the jpg thumbnails are computed. Snapshot generation can be controlled using a depth parameter; thatmeans a snapshot is only be generated if the crawl depth of a document is smaller or equal to the given number here. If the number is set to -1,no snapshots are generated.>Crawler Filter<These are limitations on the crawl stacker. The filters will be applied before a web page is loaded.Crawling Depth<This defines how often the Crawler will follow links (of links..) embedded in websites.0 means that only the page you enter under "Starting Point" will be addedto the index. 2-4 is good for normal indexing. Values over 8 are not useful, since a depth-8 crawl willindex approximately 25.600.000.000 pages, maybe this is the whole WWW.also all linked non-parsable documentsUnlimited crawl depth for URLs matching withMaximum Pages per Domain>Use<>Page-Count<misc. Constraints>Load Filter on URLs<>Load Filter on IPs<Must-Match List for Country CodesCrawls can be restricted to specific countries. This uses the country code that can be computed fromthe IP of the server that hosts the page. The filter is not a regular expressions but a list of country codes, separated by comma.no country code restrictionFilter on URLsDocument FilterThese are limitations on index feeder. The filters will be applied after a web page was loaded.>Filter on URLs<The filter is a>regular expression<that <b>must not match</b> with the URLs to allow that the content of the url is indexed.> must-match<> must-not-match<(must not be empty)Filter on Content of Document<br/>(all visible text, including camel-case-tokenized url and title)Clean-Up before Crawl Start>No Deletion<>Re-load<For each host in the start url list, delete all documents (in the given subpath) from that host.Delete sub-pathDelete only oldDo not delete any document before the crawl is started.Treat documents that are loaded> ago as stale and delete them before the crawl is started.After a crawl was done in the past, document may become stale and eventually they are also deleted on the target host.To remove old files from the search index it is not sufficient to just consider them for re-load but it may be necessaryto delete them because they simply do not exist any more. Use this in combination with re-crawl while this time should be longer.Double-Check RulesNo Doublesto use that check the 're-load' option.> ago as stale and load them again. If they are younger, they are ignored.Never load any page that is already known. Only the start-url may be loaded again.Robot BehaviourUse Special User Agent and robot identificationYou are running YaCy in non-p2p mode and because YaCy can be used as replacement for commercial search appliances(like the GSA) the user must be able to crawl all web pages that are granted to such commercial plattforms.Not having this option would be a strong handicap for professional usage of this software. Therefore you are able to selectalternative user agents here which have different crawl timings and also identify itself with another user agent and obey the corresponding robots rule.index textindex mediaThis enables indexing of the webpages the crawler will download. This should be switched on by default, unless you want to crawl only to fill theDocument Cache without indexing.Do Remote IndexingDescribe your intention to start this global crawl (optional)This message will appear in the 'Other Peer Crawl Start' table of other peers.If checked, the crawler will contact other peers and use them as remote indexers for your crawl.If you need your crawling results locally, you should switch this off.Only senior and principal peers can initiate or receive remote crawls.A YaCyNews message will be created to inform all peers about a global crawlso they can omit starting a crawl with the same start point.Add Crawl result to collection(s)A crawl result can be tagged with names which are candidates for a collection request.These tags can be selected with theGSA interfaceusing the 'site' operator.To use this option, the 'collection_sxt'-field must be switched on in theSolr Schema"Start New Crawl Job"Restrict to start domainRestrict to sub-pathFollowing frames is NOT done by Gxxg1e, but we do by default to have a richer content. 'nofollow' in robots metadata can be overridden; this does not affect obeying of the robots.txt which is never ignored.Network ScannerYaCy can scan a network segment for available http, ftp and smb server.You must first select a IP range and then, after this range is scanned,it is possible to select servers that had been found for a full-site crawl.No servers had been detected in the given IP rangePlease enter a different IP range for another scan.Please wait...>Scan the network<Scan RangeScan sub-range with given hostFull Intranet Scan:Do not use intranet scan results, you are not in an intranet environment!All known hosts in the search index (/31 subnet recommended!)only the given host(s)addresses)Subnet<Time-Out<>Scan Cache<accumulate scan results with access type "granted" into scan cache (do not delete old scan result)>Service Type<>Scheduler<run only a scanscan and add all sites with granted access automatically. This disables the scan cache accumulation.Look every>minutes<>hours<>days<again and add new sites automatically to indexer.Sites that do not appear during a scheduled scan period will be excluded from search results."Scan"YaCy '#[clientname]#': Crawl Start>Site Crawling<Site Crawler:Download all web pages from a given domain or base URL.>Site Crawl Start<>Site<Start URL (must start withLink-List of URLSitemap URL>Path<load all files in domainload only files in a sub-path of given url>Limitation<not more than <>documents<Collection<>Start<"Start New Crawl"Hints<>Crawl Speed Limitation<No more that two pages are loaded from the same host in one second (not more that 120 document per minute) to limit the load on the target server.>Target Balancer<A second crawl for a different host increases the throughput to a maximum of 240 documents per minute since the crawler balances the load over all hosts.>High Speed Crawling<A 'shallow crawl' which is not limited to a single host (or site)can extend the pages per minute (ppm) rate to unlimited documents per minute when the number of target hosts is high.This can be done using the <a href="CrawlStartExpert.html">Expert Crawl Start</a> servlet.>Scheduler Steering<The scheduler on crawls can be changed or removed using the <a href="Table_API_p.html">API Steering</a>.Click on this API button to see an XML with information about the crawler status>Crawler<>Queues<>Queue<Crawler PPMError with profile management. Please stop YaCy, delete the file DATA/PLASMADB/crawlProfiles0.dband restart.Error:Application not yet initialized. Sorry. Please wait some seconds and repeatERROR: Crawl filterdoes not match withcrawl rootPlease try again with differentfilter. ::Crawling offailed. Reason:Error with URL inputError with file inputstarted.Please wait some seconds,it may take some seconds until the first result appears there.If you crawl any un-wanted pages,you can delete them <a href="IndexCreateQueues_p.html?stack=LOCAL">here</a>.<br />>Size>Progress<"set"Loader>Index Size<Seg-<br/>ments>Documents<>solr search api<>Webgraph Edges<Citations<br/>(reverse link index)RWIs<br/>(P2P Chunks)Local CrawlerLimit CrawlerRemote CrawlerNo-Load CrawlerSpeed / PPM<br/>(Pages Per Minute)DatabaseEntriesIndicatorLevelPostprocessing ProgressTraffic (Crawler)>Load<pending:>Running CrawlsNameStatusRunningTerminate AllConfirm Termination of All Crawls"Terminate"Crawled PagesLoad<Knowledge LoaderYaCy can use external libraries to enable or enhance some functions. These libraries are notincluded in the main release of YaCy because they would increase the application file too much.You can download additional files here.>Geolocalization<Geolocalization will enable YaCy to present locations from OpenStreetMap according to given search words.>GeoNames<With this file it is possible to find cities all over the world.Content<cities with a population > 1000 all over the worldcities with a population > 5000 all over the worldcities with a population > 100000 all over the world (the set is is reduced to cities > 100000)>Download from<>Storage location<>Status<>not loaded<>loaded<:deactivated>Action<>Result<"Load""Deactivate""Remove""Activate">loaded and activated dictionary file<>loading of dictionary file failed: #[error]#<>deactivated and removed dictionary file<>cannot remove dictionary file: #[error]#<>deactivated dictionary file<>cannot deactivate dictionary file: #[error]#<>activated dictionary file<>cannot activate dictionary file: #[error]#<>With this file it is possible to find locations in Germany using the location (city) name, a zip code, a car sign or a telephone pre-dial number.<Suggestions<Suggestion dictionaries will help YaCy to provide better suggestions during the input of search wordsThis file provides 100000 most common german words for suggestions>TutorialYou are using the administration interface of your own search engineYou can create your own search index with YaCyTo learn how to do that, watch one of the demonstration videos belowtwitter this videoDownload from VimeoMore TutorialsPlease see the tutorials onYaCy: TutorialIndex BrowserBrowse the index of #[ucount]# documents.Enter a host or an URL for a file list or view a list of>all hosts<>only hosts with urls pending in the crawler<> or <>only with load errors<Host/URLBrowse Host"Delete Subpath"Browser for"Re-load load-failure docs (404s etc)"Confirm Deletion>Host List<Count Colors:Documents without ErrorsPending in CrawlerCrawler Excludes<Load Errors<documents stored for host: #[hostsize]#documents stored for subpath: #[subpathloadsize]#unloaded documents detected in subpath: #[subpathdetectedsize]#>Path<>stored<>linked<>pending<>excluded<>failed<Show Metadatalink, detected from contextload & index>indexed<>loading<Outbound Links, outgoing from #[host]# - Host ListInbound Links, incoming to #[host]# - Host List<html lang="en">'number of documents about this date'"show link structure graph"Host has load error(s)Administration OptionsDelete all>Load Errors<from index"Delete Load Errors"Index Cleaner>URL-DB-CleanerTotal URLs searched:Blacklisted URLs found:Percentage blacklisted:last searched URL:last blacklisted URL found:>RWI-DB-CleanerRWIs at Start:RWIs now:wordHash in Progress:last wordHash with deleted URLs:Number of deleted URLs in on this Hash:URL-DB-Cleaner - Clean up the database by deletion of blacklisted urls:Start/ResumeStopPauseRWI-DB-Cleaner - Clean up the database by deletion of words with reference to blacklisted urls:Reverse Word Index AdministrationThe local index currently contains #[wcount]# reverse word indexesRWI Retrieval (= search for a single word)Retrieve by Word:<"Show URL Entries for Word"Retrieve by Word-Hash"Show URL Entries for Word-Hash""Generate List"LimitationsIndex Reference SizeNo reference size limitation (this may cause strong CPU load when words are searched that appear very often)Limitation of number of references per word:(this causes that old references are deleted if that limit is reached)>Set References Limit<No entry for word '#[word]#'No entry for word hashSearch resulttotal URLs</td>appearance in</td>in link type</td>document type</td><td>description</td><td>title</td><td>creator</td><td>subject</td><td>url</td><td>emphasized</td><td>image</td><td>audio</td><td>video</td><td>app</td>index of</td>>Selection</td>Display URL ListNumber of linesall lines"List Selected URLs"Transfer RWI to other PeerTransfer by Word-Hash"Transfer to other peer"to Peer<dd>selector enter a hashor peer name:Sequential List of Word-HashesNo URL entries related to this word hash>#[count]# URL entries related to this word hashResource</td>Negative Ranking FactorsPositive Ranking FactorsReverse Normalized Weighted Ranking Sumhash</td>dom length</td>url length</td>pos in text</td>pos of phrase</td>pos in phrase</td><td>authority</td><td>date</td>words in title</td>words in text</td>local links</td>remote links</td>hitcount</td>unresolved URL HashWord DeletionDeletion of selected URLsdelete also the referenced URL (recommended, may produce unresolved referencesat other word indexes but they do not harm)for every resolvable and deleted URL reference, delete the same reference at every other word wherethe reference exists (very extensive, but prevents further unresolved references)"Delete reference to selected URLs""Delete Word"Blacklist Extension"Add selected URLs to blacklist""Add selected domains to blacklist"These document details can be retrieved as <a href="http://www.w3.org/TR/xhtml-rdfa-primer/" target="_blank">XHTML+RDFa</a>document containg <a href="http://www.w3.org/RDF/" target="_blank">RDF</a> annotations in <a href="http://dublincore.org/" target="_blank">Dublin Core</a> vocabulary.The XHTML+RDFa data format is both a XML content format and a HTML display format and is considered as an important <a href="http://www.w3.org/2001/sw/" target="_blank">Semantic Web</a> content format.The same content can also be retrieved as pure <a href="api/yacydoc.xml?urlhash=#[urlhash]#">XML metadata</a> with DC tag name vocabulary.Click the API icon to see an example call to the search rss API.To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>.URL Database AdministrationThe local index currently contains #[ucount]# URL referencesURL RetrievalRetrieve by URL:<"Show Details for URL"Retrieve by URL-Hash"Show Details for URL-Hash"CleanupIndex DeletionDelete local search index (embedded Solr and old Metadata)Delete remote solr indexDelete RWI Index (DHT transmission words)Delete Citation Index (linking between URLs)Delete First-Seen Date TableDelete HTTP & FTP CacheStop Crawler and delete Crawl QueuesDelete robots.txt Cache"Delete"Confirm DeletionStatistics about top-domains in URL DatabaseShow topdomains from all URLs."Generate Statistics"Statistics about the top-#[domains]# domains in the database:"delete all">Domain<>Optimize Solr<merge to max. <> segments"Optimize Solr"Reboot Solr Core"Shut Down and Re-Start Solr"queryNo entry found for URL-hash"Show Content""Delete URL"this may produce unresolved references at other word indexes but they do not harm"Delete URL and remove all references from words"Optimize Solrdelete the reference to this url at every other word where the reference exists (very extensive, but prevents unresolved references)Loader QueueThe loader set is emptyThere are #[num]# entries in the loader set:InitiatorDepthStatusParser ErrorsRejected URLsThere are #[num]# entries in the rejected-urls list.Showing latest #[num]# entries."show more""clear list"TimeFail-ReasonRejected URL List:There are #[num]# entries in the rejected-queue:This crawler queue is emptyClick on this API button to see an XML with information about the crawler latency and other statistics.Delete Entries:InitiatorProfileDepthModified DateAnchor NameCountDelta/msHost"Delete"Crawl Queue<>Count<>Initiator<>Profile<>Depth<Index Deletion<The search index contains #[doccount]# documents. You can delete them here.Deletions are made concurrently which can cause that recently deleted documents are not yet reflected in the document count.Delete by URL Matching<Delete all documents within a sub-path of the given urls. That means all documents must start with one of the url stubs as given here.One URL stub, a list of URL stubs<br/>or a regular expressionMatching Method<sub-path of given URLsmatching with regular expression"Simulate Deletion""no actual deletion, generates only a deletion count""Engage Deletion""simulate a deletion first to calculate the deletion count""engaged"selected #[count]# documents for deletiondeleted #[count]# documentsDelete by Age<Delete all documents which are older than a given time period.Time Period<All documents older thanyears<months<days<hours<Age Identification<>load date>last-modifiedDelete Collections<Delete all documents which are inside specific collections.Not Assigned<Delete all documents which are not assigned to any collection, separated by ',' (comma) or '|' (vertical bar); or>generate the collection list...Assigned<Delete all documents which are assigned to the following collection(s)Delete by Solr Query<This is the most generic option: select a set of documents using a solr query.The local index currently contains #[ucount]# documents.Loaded URL ExportExport PathURL Filter>query<maximum age (seconds, -1 = unlimited)Export FormatFull Data Records:(Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/)(Rich and full-text Elasticsearch data, one document per line in one flat JSON file, can be bulk-imported to elasticsearch with the command "curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary @yacy_dump_XXX.flatjson")Full URL List:Plain Text List (URLs only)HTML (URLs with title)Only Domain:Plain Text List (domains only)HTML (domains as URLs, no title)>Only Text:Fulltext of Search Index TextExport to file #[exportfile]# is running .. #[urlcount]# Documents so farFinished export of #[urlcount]# Documents to fileImport this file by moving it to DATA/SURROGATES/inExport to file #[exportfile]# failed:Dump and Restore of Solr Index"Create Dump"Dump File"Restore Dump"Stored a solr dump to fileIndex Sources & Targets YaCy supports multiple index storage locations.As an internal indexing database a deep-embedded multi-core Solr is used and it is possible to attach also a remote Solr.Solr Search IndexSolr stores the main search index. It is the home of two cores, the default 'collection1' core for documents and the 'webgraph' core for a web structure graph. Detailed information about the used Solr fields can be edited in the <a href="IndexSchema_p.html">Schema Editor</a>.Lazy Value Initialization If checked, only non-zero values and non-empty strings are written to Solr fields.Use deep-embedded local Solr This will write the YaCy-embedded Solr index which stored within the YaCy DATA directory.The Solr native search interface is accessible at<br/><a href="solr/select?q=*:*&start=0&rows=3&core=collection1">/solr/select?q=*:*&start=0&rows=3&core=collection1</a>for the default search index (core: collection1) and at<br/><a href="solr/select?q=*:*&start=0&rows=3&core=webgraph">/solr/select?q=*:*&start=0&rows=3&core=webgraph</a>for the webgraph core.<br/>If you switch off this index, a remote Solr must be activated.Use remote Solr server(s)Solr HostsSolr Host Administration InterfaceIndex SizeIt's easy to <a href="https://wiki.yacy.net/index.php/Dev:Solr" target="_blank">attach an external Solr to YaCy</a>.This external Solr can be used instead the internal Solr. It can also be used additionally to the internal Solr, then both Solr indexes are mirrored.Solr URL(s)You can set one or more Solr targets here which are accessed as a shard. For several targets, list them using a ',' (comma) as separator.The set of remote targets are used as shard of a complete index. The host part of the url is used as key for a hash function which selects one of the shards (one of your remote servers).When a search request is made, all servers are accessed synchronously and the result is combined.Sharding Method<br/>write-enabled (if unchecked, the remote server(s) will only be used as search peers)Web Structure IndexThe web structure index is used for host browsing (to discover the internal file/folder structure), ranking (counting the number of references) and file search (there are about fourty times more links from loaded pages as in documents of the main search index).use citation reference index (lightweight and fast)use webgraph search index (rich information in second Solr core)"Set"Peer-to-Peer OperationThe 'RWI' (Reverse Word Index) is necessary for index transmission in distributed mode. For portal or intranet mode this must be switched off.support peer-to-peer index transmission (DHT RWI index)MediaWiki Dump ImportNo import thread is running, you can start a new thread hereBad input data:MediaWiki Dump File Selection: select an XML file (which may be bz2- or gz-encoded)You can import <a href="https://dumps.wikimedia.org/backup-index-bydb.html" target="_blank">MediaWiki dumps</a> here. An example is the file"Import MediaWiki Dump"When the import is started, the following happens:The dump is extracted on the fly and wiki entries are translated into Dublin Core data format. The output looks like this:Each 10000 wiki records are combined in one output file which is written to /DATA/SURROGATES/in into a temporary file.When each of the generated output file is finished, it is renamed to a .xml fileEach time a xml surrogate file appears in /DATA/SURROGATES/in, the YaCy indexer fetches the file and indexes the record entries.When a surrogate file is finished with indexing, it is moved to /DATA/SURROGATES/outYou can recycle processed surrogate files by moving them from /DATA/SURROGATES/out to /DATA/SURROGATES/inImport ProcessThread:Dump:Processed:Wiki EntriesSpeed:articles per second<Running Time:hours,minutes<Remaining Time:List of #[num]# OAI-PMH Servers"Load Selected Sources"OAI-PMH source import list>Source<Import List>Thread<>Processed<br />Chunks<>Imported<br />Records<>Speed<br />(records/second)Complete atOAI-PMH ImportResults from the import can be monitored in the <a href="CrawlResults.html?process=7">indexing results for surrogatesSingle request importThis will submit only a single request as given here to a OAI-PMH server and imports records into the index"Import OAI-PMH source"Source:Processed:records<ResumptionToken:Import failed:Import all Records from a serverImport all records that follow according to resumption elements into index"import this source"::or "import from a list"Import started!Bad input data:Warc ImportWeb Archive File ImportNo import thread is running, you can start a new thread hereWarc File Selection: select an warc file (which may be gz compressed)You can download warc archives for example hereInternet ArchiveImport Warc FileImport ProcessThread:Warc File:Processed:EntriesSpeed:pages per secondRunning Time:hours,minutes<Remaining Time:Field Re-Indexing<In case that an index schema of the embedded/local index has changed, all documents with missing field entries can be indexed again with a reindex job."refresh page"Documents in current queue<Documents processed<current select query"start reindex job now""stop reindexing"Remaining field listreindex documents containing these fields:Re-Crawl Index DocumentsSearches the local index and selects documents to add to the crawler (recrawl the document).This runs transparent as background job.Documents are added to the crawler only if no other crawls are activeand are added in small chunks."start recrawl job now""stop recrawl job"Re-Crawl Query DetailsDocuments to processCurrent QueryEdit Solr Queryupdateto re-crawl documents selected with the given query.Include failed URLs>Field<>count<Re-crawl works only with an embedded local Solr index!SimulateCheck only how many documents would be selected for recrawl"Browse metadata of the #[rows]# first selected documents"document(s)</a>#(/showSelectLink)# selected for recrawl.>Solr query <Set defaults"Reset to default values"Last #(/jobStatus)#Re-Crawl job reportAutomatically refreshingAn error occurred while trying to refresh automaticallyThe job terminated early due to an error when requesting the Solr index.>Status<"Running""Shutdown in progress""Terminated"Running::Shutdown in progress::Terminated>Query<>Start time<>End time<URLs added to the crawler queue for recrawl>Recrawled URLs<URLs rejected for some reason by the crawl stacker or the crawler queue. Please check the logs for more details.>Rejected URLs<>Malformed URLs<"#[malformedUrlsDeletedCount]# deleted from the index"> Refresh<Solr Schema EditorIf you use a custom Solr schema you may enter a different field name in the column 'Custom Solr Field Name' of the YaCy default attribute nameSelect a core:the core can be searched atActiveAttributeCustom Solr Field NameCommentshow activeshow all availableshow disabled"Set""reset selection to default">Reindex documents<If you unselected some fields, old documents in the index still contain the unselected fields.To physically remove them from the index you need to reindex the documents.Here you can reindex all documents with inactive fields."reindex Solr"You may monitor progress (or stop the job) under <a href="IndexReIndexMonitor_p.html">IndexReIndexMonitor_p.html</a>YaCy '#[clientname]#': Configuration of a Wiki SearchIntegration in MediaWikiIt is possible to insert wiki pages into the YaCy index using a web crawl on that pages.This guide helps you to crawl your wiki and to insert a search window in your wiki pages.Retrieval of Wiki PagesThe following form is a simplified crawl start that uses the proper values for a wiki crawl.Just insert the front page URL of your wiki.After you started the crawl you may want to get backto this page to read the integration hints below.URL of the wiki main pageThis is a crawl start point"Get content of Wiki: crawl wiki pages"Inserting a Search Window to MediaWikiTo integrate a search window into a MediaWiki, you must insert some code into the wiki template.There are several templates that can be used for MediaWiki, but in this guide we consider thatyou are using the default template, 'MonoBook.php':open skins/MonoBook.phpfind the line where the default search window is displayed, there are the following statements:Remove that code or set it in comments using '<!--' and '-->'Insert the following code:Search with YaCy in this Wiki:value="Search"Check all appearances of static IPs given in the code snippet and replace it with your own IP, or your host nameYou may want to change the default text elements in the code snippetTo see all options for the search widget, look at the more generic description of search widgets atthe <a href="ConfigLiveSearch.html">configuration for live search</a>.Configuration of a phpBB3 SearchIntegration in phpBB3It is possible to insert forum pages into the YaCy index using a database import of forum postings.This guide helps you to insert a search window in your phpBB3 pages.Retrieval of phpBB3 Forum Pages using a database exportForum posting contain rich information about the topic, the time, the subject and the author.This information is in an bad annotated form in web pages delivered by the forum software.It is much better to retrieve the forum postings directly from the database.This will cause that YaCy is able to offer nice navigation features after searches.YaCy has a phpBB3 extraction feature, please go to the <a href="ContentIntegrationPHPBB3_p.html">phpBB3 content integration</a> servlet for direct database imports.Retrieval of phpBB3 Forum Pages using a web crawlThe following form is a simplified crawl start that uses the proper values for a phpbb3 forum crawl.Just insert the front page URL of your forum. After you started the crawl you may want to get backto this page to read the integration hints below.URL of the phpBB3 forum main pageThis is a crawl start point"Get content of phpBB3: crawl forum pages"Inserting a Search Window to phpBB3To integrate a search window into phpBB3, you must insert some code into a forum template.There are several templates that can be used for phpBB3, but in this guide we consider thatyou are using the default template, 'prosilver'open styles/prosilver/template/overall_header.htmlfind the line where the default search window is displayed, thats right behind the <pre><div id="search-box"></pre> statementInsert the following code right behind the div tagYaCy Forum Search;YaCy SearchCheck all appearances of static IPs given in the code snippet and replace it with your own IP, or your host nameYou may want to change the default text elements in the code snippetTo see all options for the search widget, look at the more generic description of search widgets atthe <a href="ConfigLiveSearch.html">configuration for live search</a>.Configuration of a RSS SearchLoading of RSS Feeds<RSS feeds can be loaded into the YaCy search index.This does not load the rss file as such into the index but all the messages inside the RSS feeds as individual documents.URL of the RSS feed>Preview<"Show RSS Items"IndexingAvailable after successful loading of rss feed in preview"Add All Items to Index (full content of url)">once<>load this feed once now<>scheduled<>repeat the feed loading every<>minutes<>hours<>days<> automatically.>List of Scheduled RSS Feed Load Targets<>Title<>URL/Referrer<>Recording<>Last Load<>Next Load<>Last Count<>All Count<>Avg. Update/Day<"Remove Selected Feeds from Scheduler""Remove All Feeds from Scheduler">Available RSS Feed List<"Remove Selected Feeds from Feed List""Remove All Feeds from Feed List""Add Selected Feeds to Scheduler">new<>enqueued<>indexed<>RSS Feed of>Author<>Description<>Language<>Date<>Time-to-live<>Docs<>State<"Add Selected Items to Index (full content of url)"Send messageYou cannot send a message toThe peer does not respond. It was now removed from the peer-list.The peer <b>is alive and responded:You are allowed to send me a messagekb and anattachment ≤Your MessageSubject:Text:"Enter""Preview"You can useWiki Code</a> here.Preview messageThe message has not been sent yet!The peer is alive but cannot respond. Sorry.Your message has been sent. The target peer responded:The target peer is alive but did not receive your message. Sorry.Here is a copy of your message, so you can copy it to save it for further attempts:>MessagesDate</td>From</td>To</td>>SubjectActionFrom:To:Date:>viewreply>deleteCompose MessageSend message to peer"Compose"Message:inboxYaCy Search NetworkYaCy Network<The information that is presented on this page can also be retrieved as XML.Click the API icon to see the XML.To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>.Network OverviewActive Principal and Senior PeersPassive Senior PeersJunior (fragment) PeersNetwork History<b>Count of Connected Senior Peers</b> in the last two days, scale = 1h<b>Count of all Active Peers Per Day</b> in the last week, scale = 1d<b>Count of all Active Peers Per Week</b> in the last 30d, scale = 7d<b>Count of all Active Peers Per Month</b> in the last 365d, scale = 30dActive Principal and Senior Peers in '#[networkName]#' NetworkPassive Senior Peers in '#[networkName]#' NetworkJunior Peers (a fragment) in '#[networkName]#' NetworkManually contacting Peerno remote #[peertype]# peer for this list knownShowing #[num]# entries from a total of #[total]# peers.send <strong>M</strong>essage/<br/>show <strong>P</strong>rofile/<br/>edit <strong>W</strong>iki/<br/>browse <strong>B</strong>logSearch for a peername (RegExp allowed)"Search"NameAddressHashTypeRelease<Last<br/>SeenLocationOffsetSend message to peerView profile of peerRead and edit wiki on peerBrowse blog of peer"DHT Receive: yes""DHT receive enabled""DHT Receive: no; #[peertags]#""DHT Receive: no""no DHT receive""Accept Crawl: no""no crawl""Accept Crawl: yes""crawl possible"Contact: passiveContact: directSeed download: possibleruntime:>Network<>Online Peers<>Number of<br/>Documents<Indexing Speed:Pages Per Minute (PPM)Query Frequency:Queries Per Hour (QPH)>Today<>Last Week<>Last Month<Last Hour>Now<>Active Senior<>Passive Senior<>Junior (fragment)<>This Peer<URLs for<br/>Remote Crawl"The YaCy Network"Indexing<br/>PPM(public local)(remote)Your Peer:>Name<>Info<>Version<>UTC<>Uptime<>Links<Sent<br/>URLsSent<br/>DHT Word ChunksReceived<br/>URLsReceived<br/>DHT Word ChunksKnown<br/>SeedsConnects<br/>per hour>dark green font<senior/principal peers>light green font<>passive peers<>pink font<junior peersred pointthis peer>grey waves<>crawling activity<>green radiation<>strong query activity<>red lines<>DHT-out<>green lines<>DHT-in<Count of Connected Senior Peersin the last two days, scale = 1hCount of all Active Peers Per Dayin the last week, scale = 1dCount of all Active Peers Per Weekin the last 30d, scale = 7dCount of all Active Peers Per Monthin the last 365d, scale = 30dOverviewIncoming NewsProcessed NewsOutgoing NewsPublished NewsThis is the YaCyNews system (currently under testing).The news service is controlled by several entry points:A crawl start with activated remote indexing will automatically create a news entry.Other peers may use this information to prevent double-crawls from the same start point.A table with recently started crawls is presented on the Index Create - pageA change in the personal profile will create a news entry. You can see recently made changes ofprofile entries on the Network page, where that profile change is visualized with a '*' beside the 'P' (profile) - selector.Publishing of added or modified translation for the user interface.Other peers may include it in their local translation list.To publish a translation, use the integratedtranslation editorto add a translation and publish it afterwards.Above you can see four menues:<strong>Incoming News (#[insize]#)</strong>: latest news that arrived your peer.Only these news will be used to display specific news services as explained above.You can process these news with a button on the page to remove their appearance from the IndexCreate and Network page<strong>Processed News (#[prsize]#)</strong>: this is simply an archive of incoming news that you removed by processing.<strong>Outgoing News (#[ousize]#)</strong>: here your can see news entries that you have created. These news are currently broadcasted to other peers.you can stop the broadcast if you want.<strong>Published News (#[pusize]#)</strong>: your news that have been broadcasted sufficiently or that you have removed from the broadcast list.OriginatorCreatedCategoryReceivedDistributedAttributesProcess Selected NewsDelete Selected NewsAbort Publication of Selected NewsProcess All NewsDelete All NewsAbort Publication of All News"#(page)#::Process Selected News::Delete Selected News::Abort Publication of Selected News::Delete Selected News#(/page)#""#(page)#::Process All News::Delete All News::Abort Publication of All News::Delete All News#(/page)#"More news services will follow.Performance of Concurrent ProcessesserverProcessor ObjectsQueue Size<br />CurrentQueue Size<br />MaximumExecutors:<br />Current Number of ThreadsConcurrency:<br />Maximum Number of ThreadsChildsAverage<br />Block Time<br />ReadingAverage<br />Exec TimeAverage<br />Block Time<br />WritingTotal<br />CyclesFull DescriptionPerformance Settings for Memoryrefresh graphsimulate short memory statususe Standard Memory Strategy</label> (current: #[memoryStrategy]#)Memory UsageAfter StartupAfter Initializationsbefore GCafter GC>Nowbefore <Descriptionmaximum memory that the JVM will attempt to use>Available<total available memory including free for the JVM within maximum>Max<>Total<total memory taken from the OS>Free<free memory in the JVM within total amount>Used<used memory in the JVM within total amountSolr Resources>Class<>Type<>Statistics<>Size<Table RAM Index>Key>ValueTable</td>Chunk Size<Used Memory<Object Index CachesNeeded MemoryObject Read Caches>Read Hit Cache<>Read Miss Cache<>Read Hit<>Read Miss<Write Unique<Write Double<Deletes<Flushes<Total MemMB (hit)MB (miss)Stop Grow when less than #[objectCacheStopGrow]# MB available leftStart Shrink when less than #[objectCacheStartShrink]# MB availabe leftOther Caching Structures>Hit<>Miss<Insert<Delete<Search Event Cache<Performance Settings of Queues and ProcessesScheduled tasks overview and waiting time settings:>Thread<Queue Size>TotalCyclesBlock TimeSleep TimeExec Time<td>Idle>BusyShort Mem<br />Cycles>per Cycle>per Busy-Cycle>Memory Use>Delay between>idle loops>busy loopsMinimum of<br />Required MemoryMaximum of<br />System-LoadFull DescriptionSubmit New Delay ValuesRe-set to defaultChanges take effect immediatelyCache Settings:RAM Cache<td>DescriptionWords in RAM cache:(Size in KBytes)This is the current size of the word caches.The indexing cache speeds up the indexing process, the DHT cache holds indexes temporary for approval.The maximum of this caches can be set below.Maximum URLs currently assigned<br />to one cached word:This is the maximum size of URLs assigned to a single word cache entry.If this is a big number, it shows that the caching works efficiently.Maximum age of a word:This is the maximum age of a word in an index in minutes.Minimum age of a word:This is the minimum age of a word in an index in minutes.Maximum number of words in cache:This is is the number of word indexes that shall be held in theram cache during indexing. When YaCy is shut down, this cache must beflushed to disc; this may last some minutes.Enter New Cache SizeThread Pool Settings:Thread Poolmaximum Activecurrent ActiveEnter new Threadpool Configurationmilliseconds<kbytes<load<Performance Settings of Search SequenceSearch Sequence TimingTiming results of latest search request:QueryEvent<Comment<Time<Duration (ms)Result-CountThe network picture below shows how the latest search query was solved by asking corresponding peers in the DHT:red -> request list alivegreen -> request has terminatedgrey -> the search target hash order position(s) (more targets if a dht partition is used)<"Search event picture"<html lang="en">Performance SettingsMemory SettingsMemory reserved for <abbr title="Java Virtual Machine">JVM</abbr>MByte"Set"Resource ObserverMemory state>proper<>exhausted<Reset stateManually reset to 'proper' stateEnough memory is available for proper operation.Within the last eleven minutes, at least four operations have tried to request memory that would have reduced free space within the minimum required.Minimum requiredAmount of memory (in Mebibytes) that should at least be free for proper operationDisable <abbr title="Distributed Hash Table">DHT</abbr>-in below.Free space diskSteady-state minimumAmount of space (in Mebibytes) that should be kept free as steady state<abbr title="Mebibyte">MiB</abbr>Disable crawls when free space is below.Absolute minimumAmount of space (in Mebibytes) that should at least be kept free as hard limitDisable <abbr title="Distributed Hash Table">DHT</abbr>-in when free space is below.>Autoregulate<when absolute minimum limit has been reached.The autoregulation task performs the following sequence of operations, stopping once free space disk is over the steady-state valuedelete old releasesdelete logsdelete robots.txt tabledelete newsclear HTCACHEclear citationsthrow away large crawl queuescut away too large RWIsUsed space diskSteady-state maximumMaximum amount of space (in Mebibytes) that should be used as steady stateDisable crawls when used space is over.Absolute maximumMaximum amount of space (in Mebibytes) that should be used as hard limitDisable <abbr title="Distributed Hash Table">DHT</abbr>-in when used space is over.when absolute maximum limit has been reached.The autoregulation task performs the following sequence of operations, stopping once used space disk is below the steady-state value> free spacedisable <abbr title="Distributed Hash Table">DHT</abbr>-in below<abbr title="Random Access Memory">RAM</abbr>Accepted change. This will take effect after <strong>restart</strong> of YaCyrestart now</a>Confirm Restartrefresh graphSaveChanges take effect immediatelyOnline Caution Settings:This is the time that the crawler idles when the proxy is accessed, or a local or remote search is done.The delay is extended by this time each time the proxy is accessed afterwards.This shall improve performance of the affected process (proxy or search).(current delta isseconds since last proxy/local-search/remote-search access.)Online Caution Caseindexer delay (milliseconds) after case occurencyLocal Search:Remote Search:"Enter New Parameters"Online Caution SettingsIndexing with ProxyYaCy can be used to 'scrape' content from pages that pass the integrated caching HTTP proxy.When scraping proxy pages then <strong>no personal or protected page is indexed</strong>;those pages are detected by properties in the HTTP header (like Cookie-Use, or HTTP Authorization)or by POST-Parameters (either in URL or as HTTP protocol)and automatically excluded from indexing.You have to>setup the proxy<before use.Proxy Auto Config:this controls the proxy auto configuration script for browsers at http://localhost:8090/autoconfig.pac.yacy-domains onlywhether the proxy should only be used for .yacy-DomainsProxy pre-fetch setting:this is an automated html page loading procedure that takes actual proxy-requestedPrefetch DepthA prefetch of 0 means no prefetch; a prefetch of 1 means to prefetch allembedded URLs, but since embedded image links are loaded by the browserthis means that only embedded href-anchors are prefetched additionally.Store to CacheIt is almost always recommended to set this on. The only exception is that you have another caching proxy running as secondary proxy and YaCy is configured to used that proxy in proxy-proxy - mode.Do Local Text-IndexingIf this is on, all pages (except private content) that passes the proxy is indexed.Do Local Media-IndexingThis is the same as for Local Text-Indexing, but switches only the indexing of media content on.Do Remote IndexingIf checked, the crawler will contact other peers and use them as remote indexers for your crawl.If you need your crawling results locally, you should switch this off.Only senior and principal peers can initiate or receive remote crawls.Please note that this setting only take effect for a prefetch depth greater than 0.Proxy generallyPathThe path where the pages are stored (max. length 300)Size</label>The size in MB of the cache."Set proxy profile"The file DATA/PLASMADB/crawlProfiles0.db is missing or corrupted.Please delete that file and restart.Pre-fetch is now set to depthCaching is now #(caching)#off::on#(/caching)#.Local Text Indexing is now #(indexingLocalText)#off::onLocal Media Indexing is now #(indexingLocalMedia)#off::onRemote Indexing is now #(indexingRemote)#off::onCachepath is now set to '#[return]#'.</strong> Please move the old data in the new directory.Cachesize is now set to #[return]#MB.Changes will take effect after restart only.An error has occurred:You can see a snapshot of recently indexed pageson theURLs as crawling start points for crawling.Page.Quickly adding Bookmarks:Crawl with YaCyTitle:Link:Status:URL successfully added to Crawler QueueMalformed URLUnable to create new crawling profile for URL:Unable to add URL to crawler queue:Quick Crawl LinkSimply drag and drop the link shown below to your Browsers Toolbar/Link-Bar.If you click on it while browsing, the currently viewed website will be inserted into the YaCy crawling queue for indexing.RWI Ranking Configuration<The document ranking influences the order of the search result entities.A ranking is computed using a number of attributes from the documents that match with the search word.The attributes are first normalized over all search results and then the normalized attribute is multiplied with the ranking coefficient computed from this list.The ranking coefficient grows exponentially with the ranking levels given in the following table. If you increase a single value by one, then the strength of the parameter doubles.There are two ranking stages: first all results are ranked using the pre-ranking and from the resulting list the documents are ranked again with a post-ranking.The two stages are separated because they need statistical information from the result of the pre-ranking.Pre-Ranking>Post-Ranking<"Set as Default Ranking""Re-Set to Built-In Ranking"Solr Ranking Configuration<These are ranking attributes for Solr. This ranking applies for internal and remote (P2P or shard) Solr access.Select a profile:>Boost Function<To see all available fields, see the>YaCy Solr Schema<and look for numeric values (these are names with suffix '_i').To find out which kind of operations are possible, see the>Solr Function Query<documentation.Example: to order by date, use"Set Boost Function""Re-Set to default"You can boost with vocabularies, use the occurrence counters>Filter Query<The Filter Query is attached to every query.Use this to statically add a selection criteria to reduce the set of results.Example: "http_unique_b:true AND www_unique_b:true" will filter out all results where urls appear also with/without http(s) and/or with/without 'www.' prefix.To find appropriate fields for this query, see theYaCy Solr SchemaWarning: bad expressions here will cause that you don't have any search result!"Set Filter Query">Boost Query<Example: "fuzzyTo find appropriate fields for this query, see the and look for boolean values (with suffix '_b') or tags inside string fields (with suffix '_s' or '_sxt')."Set Boost Query"field not in local index (boost has no effect)You can boost with vocabularies, use the fieldwith valuesYou can also boost on logarithmic occurrence counters of the fields"Set Field Boosts"A Boost Function can combine numeric values from the result document to produce a number which is multiplied with the score value from the query result.The Boost Query is attached to every query. Use this to statically boost specific content in the index. means that documents, identified as 'double' are ranked very bad and appended to the end of all results (because the unique are ranked high).This is the set of searchable fields (seeEntries without a boost value are not searched.Boost values make hits inside the corresponding field more important.Regex TestTest StringRegular ExpressionThis is a Java PatternResult<no match<> match<error in expression:Remote Crawl Configuration>Remote Crawler<The remote crawler is a process that requests urls from other peers.Peers offer remote-crawl urls if the flag 'Do Remote Indexing'is switched on when a crawl is started.Remote Crawler ConfigurationYour peer cannot accept remote crawls because you need senior or principal peer status for that!>Accept Remote Crawl Requests<Perform web indexing upon request of another peer.Load with a maximum ofpages per minute"Save"Crawl results will appear in the>Crawl Result Monitor<Peers offering remote crawl URLsIf the remote crawl option is switched on, then this peer will load URLs from the following remote peers:>Name<URLs for<br/>Remote<br/>Crawl>Release<>PPM<>QPH<>Last<br/>Seen<>UTC</strong><br/>Offset<>Uptime<>Links<>Age<>Protocol<>IP<>URL<>Access<>Process<>empty<>granted<>denied<>not in index<>indexed<"Add Selected Servers to Crawler"The following servers can be searched:Available server within the given IP range>inaccessible<YaCy '#[clientname]#': Settings AcknowledgeSettings Receipt:No information has been submittedError with submitted information.Nothing changed.</p>The user name must be given.Your request cannot be processed.The password redundancy check failed. You have probably misstyped your password.Shutting down.</strong><br />Application will terminate after working off all crawling tasks.Your administration account setting has been made.Your new administration account name is #[user]#. The password has been accepted.<br />If you go back to the Settings page, you must log-in again.Your proxy access setting has been changed.Your proxy account check has been disabled.The new proxy IP filter is set toThe proxy port is:Port rebinding will be done in a few seconds.You can reach your YaCy server under the new locationYour server access filter is now set toAuto pop-up of the Status page is now <strong>disabled</strong>Auto pop-up of the Status page is now <strong>enabled</strong>You are now permanently <strong>online</strong>.After a short while you should see the effect on thestatus</a> page.The Peer Name is:Your static Ip(or DynDns) is:Seed Settings changed.#(success)#::You are now a principal peer.Seed Settings changed, but something is wrong.Seed Uploading was deactivated automatically.Please return to the settings page and modify the data.The remote-proxy setting has been changedIf you open any public web page through the proxy, you must log-in.The new setting is effective immediately, you don't need to re-start.The submitted peer name is already used by another peer. Please choose a different name.</strong> The Peer name has not been changed.Your Peer Language is:Seed Upload method was changed successfully.You are now a principal peer.Seed Upload Method:Seed File URL:Your proxy networking settings have been changed.Transparent Proxy Support is:Your message forwarding settings have been changed.Message Forwarding Support is:Message Forwarding Command:Recipient Address:You are now <strong>event-based online</strong>.You are now in <strong>Cache Mode</strong>.Only Proxy-cache ist available in this mode.You can now go back to theSettings</a> page if you want to make more changes.Send via header is:Send X-Forwarded-For header is:Your crawler settings have been changed.Generic Settings:Crawler timeout:http Crawler Settings:Maximum HTTP Filesize:ftp Crawler Settings:Maximum SMB Filesize:Maximum file Filesize:Maximum FTP Filesize:smb Crawler Settings:Your need to restart YaCy to activate the changes.URL Proxy settings have been saved.>Crawler Settings<Generic Crawler SettingsConnection timeout in msmeans unlimitedHTTP Crawler Settings:Maximum FilesizeFTP Crawler SettingsSMB Crawler SettingsLocal File Crawler SettingsMaximum allowed file size in bytes that should be downloadedLarger files will be skippedPlease note that if the crawler uses content compression, this limit is used to check the compressed content sizeSubmitChanges will take effect immediatelyTimeout:Message ForwardingWith this settings you can activate or deactivate forwarding of yacy-messages via email.Enable message forwardingEnabling/Disabling message forwarding via email.Forwarding CommandThe command-line program that should be used to forward the message.<br />Forwarding ToThe recipient email-address.<br />e.g.:"Submit"Changes will take effect immediately.Remote Proxy (optional)YaCy can use another proxy to connect to the internet. You can enter the address for the remote proxy here:Use remote proxy</label>Enables the usage of the remote proxy by yacyUse remote proxy for HTTPSSpecifies if YaCy should forward ssl connections to the remote proxy.Remote proxy hostThe ip address or domain name of the remote proxyRemote proxy portRemote proxy userRemote proxy passwordNo-proxy adressesIP addresses for which the remote proxy should not be used"Submit"Changes will take effect immediately.the port of the remote proxyProxy SettingsTransparent ProxyWith this you can specify if YaCy can be used as transparent proxy.Hint: On linux you can configure your firewall to transparently redirect all http traffic through yacy using this iptables ruleAlways FreshIf unchecked, the proxy will act using Cache Fresh / Cache Stale rules. If checked, the cache is always fresh which meansthat a page is never loaded again if it was already stored in the cache. However, if the page does not exist in the cache, it will be loaded in any case.Send "Via" HeaderSpecifies if the proxy should send the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.45" target="_blank">Via</a>http header according to RFC 2616 Sect 14.45.Send "X-Forwarded-For" HeaderSpecifies if the proxy should send the X-Forwarded-For http header."Submit"HTTP Server PortHTTPS Server Port"change"Proxy Access SettingsThese settings configure the access method to your own http proxy and server.All traffic is routed throug one single port, for both proxy and server.Server Access RestrictionsYou can restrict the access to this proxy/server using a two-stage security barrier:define an <em>access domain</em> with a list of granted client IP-numbers or with wildcardsdefine an <em>user account</em> with an user:password - pairThis is the account that restricts access to the proxy function.You probably don't want to share the proxy to the internet, so you should set theIP-Number Access Domain to a pattern that corresponds to you local intranet.The default setting should be right in most cases. If you want, you can also set a proxy accountso that every proxy user must authenticate first, but this is rather unusual.IP-Number filterUse <aSeed Upload SettingsWith these settings you can configure if you have an account on a public accessibleserver where you can host a seed-list file.General Settings:If you enable one of the available uploading methods, you will become a principal peer.Your peer will then upload the seed-bootstrap information periodically,but only if there have been changes to the seed-list.Upload Method"Submit">URL<Retry UploadingHere you can specify which upload method should be used.Select 'none' to deactivate uploading.The URL that can be used to retrieve the uploaded seed file, likeStore into filesystem:You must configure this if you want to store the seed-list file onto the file system.File LocationHere you can specify the path within the filesystem where the seed-list file should be stored."Submit"Uploading via FTP:This is the account for a FTP server where you can host a seed-list file.If you set this, you will become a principal peer.Your peer will then upload the seed-bootstrap information periodically,but only if there had been changes to the seed-list.The host where you have a FTP account, likePath</label>The remote path on the FTP server, likeMissing sub-directories are NOT created automatically.Username>Server<Your log-in at the FTP serverPassword</label>The password"Submit"Uploading via SCP:This is the account for a server where you are able to login via ssh.>Server<The host where you have an account, like 'my.host.net'Server PortThe sshd port of the host, like '22'Path</label>The remote path on the server, like '~/yacy/seed.txt'. Missing sub-directories are NOT created automatically.UsernameYour log-in at the serverPassword</label>The password"Submit"Server Access SettingsIP-Number filter:requires restartHere you can restrict access to the server.By default, the access is not limited,because this function is needed to spawn the p2p index-sharing function.If you block access to your server (setting anything else than '*'), then you will also be blockedfrom using other peers' indexes for search service.However, blocking access may be correct in enterprise environments where you only want to index yourcompany's own web pages.Filter have to be entered as IP, IP range or first part of allowed IP's separated by comma (e.g. 10.100.0-100.0-100, 127. )further details on format see Jetty fileHost:staticIP (optional):<strong>The staticIP can help that your peer can be reached by other peers in case that yourpeer is behind a firewall or proxy.</strong> You can create a tunnel through the firewall/proxy(look out for 'tunneling through https proxy with connect command') and createan access point for incoming connections.This access address can be set here (either as IP number or domain name).If the address of outgoing connections is equal to the address of incoming connections,you don't need to set anything here, please leave it blank.ATTENTION: Your current IP is recognized as "#[clientIP]#".If the value you enter here does not match with this IP,you will not be able to access the server pages anymore.value="Submit"Server Port SettingsServer port:This is the main port for all http communication (default is 8090).A change requires a restart.Server ssl port:This is the port to connect via https (default is 8443).Shutdown port:This is the local port on the loopback address ( or :1) to listen for a shutdown signal to stop the YaCy server (-1 disables the shutdown port, recommended default is 8005).Advanced SettingsIf you want to restore all settings to the default values,but <strong>forgot your administration password</strong>, you must stop the proxy,delete the file 'DATA/SETTINGS/yacy.conf' in the YaCy application root folder and start YaCy again.Server Access SettingsProxy Access SettingsCrawler SettingsRemote Proxy (optional)Seed Upload SettingsMessage Forwarding (optional)Console StatusLog-in as administrator to see full statusWelcome to YaCy!Your settings are _not_ protected!Please open the <a href="ConfigAccounts_p.html">accounts configuration</a> page <strong>immediately</strong>and set an administration password.Access is unrestricted from localhost (this includes administration features).Please check the <a href="ConfigAccounts_p.html">accounts configuration</a> page to ensure that the settings match the security level you need.You have not published your peer seed yet. This happens automatically, just wait.The peer must go online to get a peer address.You cannot be reached from outside.A possible reason is that you are behind a firewall, NAT or Router.But you can <a href="index.html">search the internet</a> using the other peers'global index on your own search page."bad""idea""good""Follow YaCy on Twitter"We encourage you to open your firewall for the port you configured (usually: 8090),or to set up a 'virtual server' in your router settings (often called DMZ).Please be fair, contribute your own index to the global index.Free disk space is lower than #[minSpace]#. Crawling has been disabled. Please fixit as soon as possible and restart YaCy.Free memory is lower than #[minSpace]#. DHT-in has been disabled. Please fixCrawling is paused! If the crawling was paused automatically, please check your disk space.Latest public version isYou can download a more recent version of YaCy. Click here to install this update and restart YaCy:Install YaCyYou can download the latest releases here:You are running a server in senior mode and you support the global internet index,which you can also <a href="index.html">search yourself</a>.You have a principal peer because you publish your seed-list to a public accessible serverwhere it can be retrieved using the URLYour Web Page Indexer is idle. You can start your own web crawl <a href="CrawlStartSite.html">here</a>Your Web Page Indexer is busy. You can <a href="Crawler_p.html">monitor your web crawl</a> here.If you need professional support, please write toFor community support, please visit our>forum<System StatusSystemYaCy versionUnknownUptime:Processors:Load:Threads:peak:total:ProtectionPassword is missingpassword-protectedUnrestricted access from localhostAddress</dt>peer address not assignedHost:Public Address:YaCy Address:Proxy</dt>Transparent not usedbroken::connectedbrokenconnectedUsed for YaCy -> YaCy communication:WARNING:You do this on your own risk.If you do this without YaCy running on a desktop-pc, this will possibly break startup.In this case, you will have to edit the configuration manually in DATA/SETTINGS/yacy.confRemote:Tray-IconExperimental<YesNoAuto-popup on start-upDisabledEnable]EnabledDisable]Memory UsageRAM used:RAM max:DISK used:(approx.)DISK free:on::offConfiguremax:Traffic >ResetProxy:Crawler:Incoming ConnectionsActive:Max:Loader Queuepaused>Queues<Local CrawlRemote triggered CrawlPre-QueueingSeed serverEnabled: Updating to serverLast upload: #[lastUpload]# ago.Enabled: Updating to fileYaCy version:Java version:>Experimental<Enabled <aReset</a>Steering</title>Checking peer status...Peer is online again, forwarding to status page...Peer is not online yet, will check again in a few seconds...No action submittedGo back to the <a href="Settings_p.html">Settings</a> pageYour system is not protected by a passwordPlease go to the <a href="ConfigAccounts_p.html">User Administration</a> page and set an administration password.You don't have the correct access right to perform this task.Please log in.You can now go back to the <a href="Settings_p.html">Settings</a> page if you want to make more changes.See you soon!Just a moment, please!Application will terminate after working off all scheduled tasks.Please send us feed-back!We don't track YaCy users, YaCy does not send 'home-pings', we do not even know how many people use YaCy as their private search engine.Therefore we like to ask you: do you like YaCy? Will you use it again... if not, why? Is it possible that we change a bit to suit your needs?Please send us feed-back about your experience with an>anonymous message<or a<posting to ourweb forums>bug report<>Professional Support<If you are a professional user and you would like to use YaCy in your company in combination with consulting services by YaCy specialists, please seeThen YaCy will restart.If you can't reach YaCy's interface after 5 minutes restart failed.Installing releaseYaCy will be restarted after installationSupporter<Please enter a comment to your link recommendation.Your Vote is also considered without a comment.Supporter are switched off for users without authorization"bookmark""Add to bookmarks""positive vote""Give positive vote""negative vote""Give negative vote"provided by YaCy peers with an URL in their profile. This shows only URLs from peers that are currently online.Surftips</title>Surftips</h2>Surftips are switched offtitle="bookmark"alt="Add to bookmarks"title="positive vote"alt="Give positive vote"title="negative vote"alt="Give negative vote"YaCy Supporters<>a list of home pages of yacy users<provided by YaCy peers using public bookmarks, link votes and crawl start points"Please enter a comment to your link recommendation. (Your Vote is also considered without a comment.)"Hide surftips for users without autorizationShow surftips to everyone: Peer SteeringThe information that is presented on this page can also be retrieved as XML.Click the API icon to see the XML.To see a list of all APIs, please visit the API wiki page>Process Scheduler<This table shows actions that had been issued on the YaCy interfaceto change the configuration or to request crawl actions.These recorded actions can be used to repeat specific actions and to send themto a scheduler for a periodic execution.>Recorded Actions<"next page""previous page" of #[of]#>Type>CommentCall Count<Recording DateLast Exec DateNext Exec Date>Event Trigger<"clone">Scheduler<>no event<>activate event<>no repetition<>activate scheduler<>off<>run once<>run regular<>after start-up<at 00:00hat 01:00hat 02:00hat 03:00hat 04:00hat 05:00hat 06:00hat 07:00hat 08:00hat 09:00hat 10:00hat 11:00hat 12:00hat 13:00hat 14:00hat 15:00hat 16:00hat 17:00hat 18:00hat 19:00hat 20:00hat 21:00hat 22:00hat 23:00h"Execute Selected Actions""Delete Selected Actions""Delete all Actions which had been created before "day<days<week<weeks<month<months<year<years<>Result of API execution>minutes<>hours<Scheduled actions are executed after the next execution date has arrived within a time frame of #[tfminutes]# minutes.To see a list of all APIs, please visit theTable ViewerThe information that is presented on this page can also be retrieved as XML.Click the API icon to see the XML.To see a list of all APIs, please visit theAPI wiki page>robots.txt table<Table ViewerTable AdministrationTable SelectionSelect Table:show max.>all<entriessearch rows for"Search"Table Editor: showing table"Edit Selected Row""Add a new Row""Delete Selected Rows""Delete Table"Row EditorPrimary Key"Commit"entries,YaCy Debugging: Thread DumpThreaddump<"Single Threaddump""Multiple Dump Statistic"Translation News for LanguageTranslation NewsYou can share your local addition to translations and distribute it to other peers.The remote peer can vote on your translation and add it to the own local translation.entries available"Publish"You can check your outgoing messages>here<To edit or add local translations you can useFile:>Originator<English:>existing<Translation:>scorenegative votepositive voteVote on this translation.If you vote positive the translation is added to your local translation list.Translation EditorTranslate untranslated text of the user interface (current language).The modified translation file is stored in DATA/LOCALE directory.UI TranslationTarget Language:activate a different languageSource Fileview itfilter untranslatedSource TextTranslated TextSave translationCheck for remote translation proposals and/or share your own added translationsUser PageYou are not logged in.<br />Username:Password: <input"login"You are currently logged in as #[username]#.You have usedold Passwordnew Password<new Password(repetition)"Change"You are currently logged in as admin.value="logout"(after logout you will be prompted for your password again. simply click "cancel")Password was changed.Old Password is wrong.New Password and its repetition do not match.New Password is empty.minutes of your onlinetime limit ofminutes per day.See the page info about the url.View URL Content>Get URL Viewer<"Show Metadata""Browse Host">URL Metadata<Search in Document:"Show Snippet"Hash(click this for full metadata)In Metadata:In Cache:Word CountDescriptionSizeMimeType:CollectionsView asPlain TextParsed TextParsed SentencesParsed Tokens/WordsLink ListCitation Report"Show"Unable to find URL Entry in DBInvalid URLUnable to download resource content.Unable to parse resource content.Unsupported protocol.>Original Content from Web<Parsed Content>Original from Web<>Original from Cache<>Parsed Tokens<Server LogLinesreversed order"refresh"Local Peer Profile:Remote Peer ProfileWrong access of this pageThe requested peer is unknown or a potential peer.The profile can't be fetched.The peeris not online.This is the Profile of>NameNick NameHomepageeMailCommentView this profile as> orYou can edit your profile <a href="ConfigProfile_p.html">here</a><html lang="en">YaCy '#[clientname]#': Federated IndexThe information that is presented on this page can also be retrieved as XMLClick the API icon to see the RDF Ontology definition for this vocabulary.To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>.Vocabulary AdministrationVocabularies can be used to produce a search navigation. A vocabulary must be created before content is indexed.The vocabulary is used to annotate the indexed content with a reference to the object that is denoted by the term of the vocabulary.The object can be denoted by a url stub that, combined with the term, becomes the url for the object.Vocabulary SelectionVocabulary Name"View"Vocabulary ProductionEmpty VocabularyAuto-DiscoverImport from a csv fileFile PathColumn for Literalsno SynonymsAuto-Enrich with Synonyms from Stemming LibraryRead Columnfirst has indexif unused setColumn for Object Link (optional)Charset of Import FileIt is possible to produce a vocabulary out of the existing search index. This is done using a given 'objectspace' which you can enter as a URL Stub.This stub is used to find all matching URLs. If the remaining path from the matching URLs then denotes a single file, the file name is used as vocabulary term.This works best with wikis. Try to use a wiki url as objectspace path.Objectspacefrom file namefrom page title from page title (splitted)from page author"Create"Vocabulary Editor>Modify<>Delete<>Literal<>Synonyms<>Object Link<>add<clear table (remove all terms)delete vocabulary<"Submit"Web StructureThe data that is visualized here can also be retrieved in a XML file, which lists the reference relation between the domains.With a GET-property 'about' you get only reference relations about the host that you give in the argument field for 'about'.With a GET-property 'latest' you get a list of references that had been computed during the current run-time of YaCy, and with each next call only an update to the next list of references.Click the API icon to see the XML file.To see a list of all APIs, please visit theAPI wiki page>Host List<>#[count]# outlinkshost<depth<nodes<time<size<>Background<>Text<>Line<>Pivot Dot<>Other Dot<>Dot-end<>Color <"change""WebStructurePicture"YaCyWiki page:last edited bychange dateEdit<only granted to adminGrant Write Access toStart PageIndexVersionsAuthor:You can useWiki Code</a> here."edit""Submit""Preview""Discard">PreviewNo changes have been submitted so far!SubjectChange DateLast AuthorIO Error reading wiki database:Select versions of pageCompare version from"Show"with version from"current""Compare"Return toChanges will be published as announcement on YaCyNewsWiki HelpWiki-CodeThis table contains a short description of the tags that can be used in the Wiki and several other servletsof YaCy. For a more detailed description visit theCodeDescriptionThese tags create headlines. If a page has three or more headlines, a table of content will be created automatically.Headlines of level 1 will be ignored in the table of content.These tags create stressed texts. The first pair emphasizes the text (most browsers will display it in italics),the second one emphazises it more strongly (i.e. bold) and the last tags create a combination of both.Text will be displayed <span class="strike">stricken through</span>.Text will be displayed <span class="underline">underlined</span>.Lines will be indented. This tag is supposed to mark citations, but may as well be used for styling purposes.These tags create a numbered list.These tags create an unnumbered list.These tags create a definition list.This tag creates a horizontal line.This tag creates links to other pages of the wiki.This tag displays an image, it can be aligned left, right or center.This tag displays a Youtube or Vimeo video with the id specified and fixed width 425 pixels and height 350 pixels.i.e. useto embed this video:These tags create a table, whereas the first marks the beginning of the table, the second startsa new line, the third and fourth each create a new cell in the line. The last displayed tagcloses the table.A text between these tags will keep all the spaces and linebreaks in it. Great for ASCII-art and program code.If a line starts with a space, it will be displayed in a non-proportional font.This tag creates links to external websites.=headlinepointsomething<another thingand yet anothersomething elseword:definitionpagenamedescription]]url descriptionalt textDocument Citations forList of other web pages with citationsSimilar documents from different hosts:Table Viewer"Edit Table">Author<>Description<>Subject<>Date<>Type<>Identifier<>Language<>Load Date<>Referrer Identifier<>Document size<>Number of Words<>Title<Websearch ComparisonLeft Search EngineRight Search Engine"Compare"Search Result AdministrationToggle navigationRe-Start<Shutdown<Download YaCyCommunity (Web Forums)Project WikiSearch InterfaceAbout This PagePortal ConfigurationPortal DesignRanking and HeuristicsCrawler MonitorIndex AdministrationFilter & BlacklistsContent SemanticTarget AnalysisProcess SchedulerMonitoringIndex BrowserNetwork Access>TerminalConfirm Re-StartConfirm ShutdownProject Wiki<Git RepositoryBugtracker"Search...""You just started a YaCy peer!""As a first-time-user you see only basic functions. Set a use case or name your peer to see more options. Start a first web crawl to see all monitoring options.""You did not yet start a web crawl!""You do not see all monitoring options here, because some belong to crawl result monitoring. Start a web crawl to see that!"First StepsUse Case & AccountLoad Web Pages, CrawlerRAM/Disk Usage & UpdatesSystem StatusPeer-to-Peer NetworkAdvanced CrawlerIndex Export/ImportSystem AdministrationConfigurationProduction>Administration<Search Portal IntegrationYou just started a YaCy peer!As a first-time-user you see only basic functions. Set a use case or name your peer to see more options. Start a first web crawl to see all monitoring options.You did not yet start a web crawl!You do not see all monitoring options here, because some belong to crawl result monitoring. Start a web crawl to see that!DesignEnglish, EnglischToggle navigationSearch InterfacesAdministration »>Web Search<>File Search<>Compare Search<>Index Browser<>URL Viewer<Example Calls to the Search API:Solr Default CoreSolr Webgraph CoreGoogle Appliance APIDownload YaCyCommunity (Web Forums)Project WikiSearch InterfaceAbout This PageBugtrackerGit RepositoryAccess TrackerServer AccessAccess GridIncoming Requests OverviewIncoming Requests DetailsAll Connections<Local Search<LogHost TrackerRemote Search<Cookie MenuIncoming CookiesOutgoing CookiesFilter & BlacklistsBlacklist AdministrationBlacklist CleanerBlacklist TestImport/ExportContent Control>Application Status<>Status<SystemThread Dump>Processes<>Server Log<>Concurrent Indexing<>Memory Usage<>Search Sequence<>Messages<>Overview<>Incoming News<>Processed News<>Outgoing News<>Published News<>Community Data<>Surftips<>Local Peer Wiki<UI TranslationsSystem AdministrationAdvanced SettingsAdvanced PropertiesViewer and administration for database tablesPerformance Settings of Busy Queues>PerformanceOverview</a>Receipts</a>Queries</a>DHT TransferProxy UseLocal Crawling</a>Global Crawling</a>Surrogate ImportCrawl ResultsProcessing MonitorCrawler<Loader<Rejected URLs>Queues<Local<GlobalRemoteNo-LoadCrawler SteeringScheduler and Profile Editor<robots.txt MonitorLoad Web PagesSite CrawlingParser Configuration>Appearance<>Language<Search Page LayoutDesign>Appearance>LanguageIndex AdministrationURL Database AdministrationIndex DeletionIndex Sources & TargetsSolr Schema EditorField Re-IndexingReverse Word IndexContent AnalysisCrawler/Spider<Crawl Start (Expert)Network ScannerCrawling of MediaWikis>Crawling of phpBB3 Forums<Network Harvesting<Remote CrawlingScraping ProxyAdvanced CrawlerCrawling of phpBB3 Forums>Database Reader<RSS Feed ImporterOAI-PMH ImporterDatabase Reader for phpBB3 ForumsDump Reader for MediaWiki dumpsRAM/Disk Usage & Updates>Performance<Web CacheDownload System UpdateSearch Box AnywhereGeneric Search PortalUser ProfileLocal robots.txtPortal ConfigurationPublicationFile HostingSolr Ranking ConfigRWI Ranking Config>Heuristics<Ranking and HeuristicsContent Semantic>Automated Annotation<Auto-Annotation Vocabulary EditorKnowledge LoaderTarget AnalysisMass Crawl CheckRegex TestUse Case & AccountsBasic Configuration>Accounts<Network ConfigurationWeb VisualizationWeb StructureImage CollageIndex Browser<html lang="en">YaCy '#[clientname]#': Search Page>Search<TextImagesAudioVideoApplicationsmore options...Results per pageResourceglobalrestrict onshow allPrefer maskConstraintsonly index pagesthe peer-to-peer networkonly the local indexQuery Operatorsrestrictionsonly urls with the <phrase> in the urlonly urls with the <phrase> within outbound links of the documentonly urls with extensiononly urls from hostonly pages with as-author-anotatedonly pages from top-level-domainsonly pages with a date between <date1> and <date2> in contentonly pages with <date> in contentonly resources from http or https serversonly resources from ftp serversthey are rarecrawl them yourselfonly resources from smb serversIntranet Indexing</a> must be selectedonly files from a local file systemspatial restrictionsonly documents having location metadata (geographical coordinates)only documents within a square zone embracing a circle of given radius (in decimal degrees) around the specified latitude and longitude (in decimal degrees)>ranking modifier<sort by datelatest firstmultiple words shall appear neardoublequotesprefer given languagean <a href="http://www.loc.gov/standards/iso639-2/php/English_list.php" title="Reference alpha-2 language codes list">ISO 639-1</a> 2-letter codeheuristicsadd search results from Search Navigationkeyboard shortcuts<a href="https://en.wikipedia.org/wiki/Access_key">Access key</a> modifier + nnext result page<a href="https://en.wikipedia.org/wiki/Access_key">Access key</a> modifier + pprevious result pageautomatic result retrievalbrowser integrationafter searching, click-open on the default search engine in the upper right search field of your browser and select 'Add "YaCy Search.."'search as rss feedclick on the red icon in the upper right after a search.this works good in combination with the '/date' ranking modifier.See an>examplejson search resultsfor ajax developers: get the search rss feed and replace the '.rss' extension in the search result url with '.json'ranking modifieradd search results from external opensearch systemsclick on the red icon in the upper right after a search. this works good in combination with the"Continue this queue""Pause this queue">Size>DateYour Username/Password is wrong.Username</label>Password</label>"login"YaCy: Error Messagerequest:unspecified errornot-yet-assigned errorYou don't have an active internet connection. Please go online.Could not load resource. The file is not available.Exception occurredGenerated #[date]# byYour Account is disabled for surfing.Your Timelimit (#[timelimit]# Minutes per Day) is reached.The servercould not be found.Did you mean:Shared BlacklistAdd Items to BlacklistUnable to store the items into the blacklist file:YaCy-Peer "<span class="settingsValue">#[name]#</span>" not found.not found or empty list.Wrong Invocation! Please>log in<as administrator to use the search functionLocation -- click on map to enlargeMap (c) by <and contributors, CC-BY-SA>Media<> of> local,remote fromYaCy peers).>search<"bookmark""recommend""delete"Picturesshow search results for "#[query]#" on map>Provider>Name Space>Author>Filetype>Language>Peer-to-Peer<Stealth ModePrivacyContext RankingSort by DateDocumentsImagesYour search is done using peers in the YaCy P2P network.You can switch to 'Stealth Mode' which will switch off P2P, giving you full privacy. Expect less results then, because then only your own search index is used.Your search is done using only your own peer, locally.You can switch to 'Peer-to-Peer Mode' which will cause that your search is done using the other peers in the YaCy network.>Documents>Images