YaCy Network Access Server Access Grid This images shows incoming connections to your YaCy peer and outgoing connections from your peer to other peers and web servers Access Tracker Server Access Overview This is a list of #[num]# requests to the local http server within the last hour. Showing #[num]# requests. >Host< >Path< Date< Access Count During last Second last Minute last 10 Minutes last Hour The following hosts are registered as source for brute-force requests to protected pages Access Times Server Access Details Local Search Log Top Search Words (last 7 Days) Local Search Host Tracker Remote Search Log Success: Remote Search Host Tracker This is a list of searches that had been requested from this' peer search interface Showing #[num]# entries from a total of #[total]# requests. Requesting Host Offset Expected Results Returned Results Used Time (ms) URL fetch (ms) Snippet comp (ms) Query Search Word Hashes Count</td> Queries Per Last Hour Access Dates This is a list of searches that had been requested from remote peer search interface This is a list of requests (max. 1000) to the local http server within the last hour. URL Proxy Settings< With this settings you can activate or deactivate URL proxy. Service call: , where parameter is the url of an external web page. >URL proxy:< >Enabled< Globally enables or disables URL proxy via Show search results via URL proxy: Enables or disables URL proxy for all search results. If enabled, all search results will be tunneled through URL proxy. Alternatively you may add this javascript to your browser favorites/short-cuts, which will reload the current browser address via the YaCy proxy servlet. or right-click this link and add to favorites: Restrict URL proxy use: Define client filter. Default: URL substitution: Define URL substitution rules which allow navigating in proxy environment. Possible values: all, domainlist. Default: domainlist. "Submit" >Autocrawler< Autocrawler automatically selects and adds tasks to the local crawl queue. This will work best when there are already quite a few domains in the index. Autocralwer Configuration You need to restart for some settings to be applied Enable Autocrawler: Deep crawl every: Warning: if this is bigger than "Rows to fetch" only shallow crawls will run. Rows to fetch at once: Recrawl only older than # days: Get hosts by query: Can be any valid Solr query. Shallow crawl depth (0 to 2): Deep crawl depth (1 to 5): Index text: Index media: "Save" Blacklist Cleaner Here you can remove or edit illegal or double blacklist-entries. Check list "Check" Allow regular expressions in host part of blacklist entries. The blacklist-cleaner only works for the following blacklist-engines up to now: Illegal Entries in #[blList]# for Deleted #[delCount]# entries Altered #[alterCount]# entries! Two wildcards in host-part Either subdomain <u>or</u> wildcard Path is invalid Regex Wildcard not on begin or end Host contains illegal chars Double "Change Selected" "Delete Selected" No Blacklist selected Blacklist Import Used Blacklist engine: Import blacklist items from... other YaCy peers: "Load new blacklist items" URL: plain text file:< XML file: Upload a regular text file which contains one blacklist entry per line. Upload an XML file which contains one or more blacklists. Export blacklist items to... Here you can export a blacklist as an XML file. This file will contain additional information about which cases a blacklist is activated for. "Export list as XML" Here you can export a blacklist as a regular text file with one blacklist entry per line. This file will not contain any additional information "Export list as text" Blacklist Test Used Blacklist engine: Test list: "Test" The tested URL was It is blocked for the following cases: Search Surftips Blacklist Administration This function provides an URL filter to the proxy; any blacklisted URL is blocked from being loaded. You can define several blacklists and activate them separately. You may also provide your blacklist to other peers by sharing them; in return you may collect blacklist entries from other peers. Active list: No blacklist selected Select list to edit: not shared::shared "select" Create new list: "create" Settings for this list "Save" Share/don't share this list Delete this list Edit list These are the domain name/path patterns in Blacklist Pattern Edit selected pattern(s) Delete selected pattern(s) Move selected pattern(s) to Add new pattern: Add URL pattern The right '*', after the '/', can be replaced by a >regular expression< domain.net/fullpath< >domain.net/*< *.domain.net/*< *.sub.domain.net/*< (slow) Activate this list for Show entries: Entries per page: "set" Edit existing pattern(s): "Save URL pattern(s)" by Comments</a> >edit >delete Edit< previous entries next entries new entry import XML-File export as XML Blog-Home Author: Subject: You can use Yacy-Wiki Code here. Comments: deactivated >activated moderated "Submit" "Preview" "Discard" >Preview No changes have been submitted so far! Access denied To edit or create blog-entries you need to be logged in as Admin or User who has Blog rights. Are you sure that you want to delete Confirm deletion Yes, delete it. No, leave it. Import was successful! Import failed, maybe the supplied file was no valid blog-backup? Please select the XML-file you want to import: Text: by Comments</a> Login Blog-Home delete</a> allow</a> Author: Subject: You can use Yacy-Wiki Code here. "Submit" "Preview" "Discard" YaCy '#[clientname]#': Bookmarks The bookmarks list can also be retrieved as RSS feed. This can also be done when you select a specific tag. Click the API icon to load the RSS from the current selection. To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>. <h3>Bookmarks Bookmarks ( List Bookmarks Add Bookmark Import Bookmarks Import XML Bookmarks Import HTML Bookmarks "import" Default Tags: imported Title: Description: Folder (/folder/subfolder): Tags (comma separated): >Public: yes no Bookmark is a newsfeed "create" File: import as Public "private bookmark" "public bookmark" Tagged with 'Confirm deletion' Edit Delete Folders Bookmark Folder Tags Bookmark List previous page next page All Show Bookmarks per page. start autosearch of new bookmarks This starts a search of new or modified bookmarks since startup in folder "search" with "query=&lt;original_search_term&gt;" Every peer online will be ask for results. Image Collage Private Queue Public Queue User Accounts User Administration User created: User changed: Generic error. Passwords do not match. Username too short. Username must be &gt;= 4 Characters. Username already used (not allowed). No password is set for the administration account. Please define a password for the admin account. Admin Account Access from localhost without account Access to your peer from your own computer (localhost access) is granted with administrator rights. No need to configure an administration account. Access only with qualified account This is required if you want a remote access to your peer, but it also hardens access controls on administration operations of your peer. Peer User: New Peer Password: Repeat Peer Password: "Define Administrator" >Access Rules< Protection of all pages: if set to on, access to all pages need authorization; if off, only pages with "_p" extension are protected. Set Access Rules Select user New user Edit User Delete User Edit current user: Username</label> Password</label> Repeat password First name Last name Address Rights Timelimit Time used Save User This setting is convenient but less secure than using a qualified admin account. Please use with care, notably when you browse untrusted and potentially malicious websites while running your YaCy peer on the same computer. Appearance and Integration You can change the appearance of the YaCy interface with skins. The selected skin and language also affects the appearance of the search page. If you <a href="ConfigPortal_p.html">create a search portal with YaCy</a> then you can change the appearance of the search page here. Skin Selection Select one of the default skins, download new skins, or create your own skin. Current skin Available Skins "Use" "Delete" >Skin Color Definition< The generic skin 'generic_pd' can be configured here with custom colors: >Background< >Text< >Legend< >Table&nbsp;Header< >Table&nbsp;Item< >Table&nbsp;Item&nbsp;2< >Table&nbsp;Bottom< >Border&nbsp;Line< >Sign&nbsp;'bad'< >Sign&nbsp;'good'< >Sign&nbsp;'other'< >Search&nbsp;Headline< >Search&nbsp;URL "Set Colors" >Skin Download< Skins can be installed from download locations Install new skin from URL Use this skin "Install" Make sure that you only download data from trustworthy sources. The new Skin file might overwrite existing data if a file of the same name exists already. >Unable to get URL: Error saving the skin. Access Configuration Basic Configuration Your port has changed. Please wait 10 seconds. Your browser will be redirected to the new <a href="http://#[host]#:#[port]#/ConfigBasic.html">location</a> in 5 seconds. The peer port was changed successfully. Your YaCy Peer needs some basic information to operate properly Select a language for the interface Deutsch Fran&ccedil;ais &#27721;&#35821;/&#28450;&#35486 &#1056;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081; &#1059;&#1082;&#1088;&#1072;&#1111;&#1085;&#1089;&#1100;&#1082;&#1072; &#2361;&#2367;&#2344;&#2381;&#2342;&#2368; &#26085;&#26412;&#35486; Use Case: what do you want to do with YaCy: Community-based web search Join and support the global network 'freeworld', search the web with an uncensored user-owned search network Search portal for your own web pages Your YaCy installation behaves independently from other peers and you define your own web index by starting your own web crawl. This can be used to search your own web pages or to define a topic-oriented search portal. Intranet Indexing Create a search portal for your intranet or web pages or your (shared) file system. URLs may be used with http/https/ftp and a local domain name or IP, or with an URL of the form or smb: Your peer name has not been customized; please set your own peer name You may change your peer name Peer Name: Your peer cannot be reached from outside which is not fatal, but would be good for the YaCy network please open your firewall for this port and/or set a virtual server option in your router to allow connections on this port Opening a router port is <i>not</i> a YaCy-specific task; you can see instruction videos everywhere in the internet, just search for <a href="http://www.youtube.com/results?search_query=Open+Ports+on+a+Router">Open Ports on a &lt;our-router-type&gt; Router</a> and add your router type as search term. However: if you fail to open a router port, you can nevertheless use YaCy with full functionality, the only function that is missing is on the side of the other YaCy users because they cannot see your peer. Your peer can be reached by other peers Peer Port: Set by system property with SSL https enabled on port Configure your router for YaCy using UPnP: Configuration was not successful. This may take a moment. Set Configuration Your basic configuration is complete! You can now (for example) just < start an uncensored search start your own crawl</a> and contribute to the global index, or create your own private web index set a personal peer profile</a> (optional settings) monitor at the network page</a> what the other peers are doing Your Peer name is a default name; please set an individual peer name. You did not set a user name and/or a password. Some pages are protected by passwords. You should set a password at the <a href="ConfigAccounts_p.html">Accounts Menu</a> to secure your YaCy peer.</p>:: You did not open a port in your firewall or your router does not forward the server port to your peer. This is needed if you want to fully participate in the YaCy network. You can also use your peer without opening it, but this is not recomended. What you should do next: Hypertext Cache Configuration The HTCache stores content retrieved by the HTTP and FTP protocol. Documents from smb:// and file:// locations are not cached. The cache is a rotating cache: if it is full, then the oldest entries are deleted and new one can fill the space. HTCache Configuration The path where the cache is stored The current size of the cache >#[actualCacheSize]# MB for #[actualCacheDocCount]# files, #[docSizeAverage]# KB / file in average The maximum size of the cache "Set" Cleanup Cache Deletion Delete HTTP &amp; FTP Cache Delete robots.txt Cache "Delete" Heuristics Configuration A <a href="http://en.wikipedia.org/wiki/Heuristic" target="_blank">heuristic</a> is an 'experience-based technique that help in problem solving, learning and discovery' (wikipedia). The search heuristics that can be switched on here are techniques that help the discovery of possible search results based on link guessing, in-search crawling and requests to other search engines. When a search heuristic is used, the resulting links are not used directly as search result but the loaded pages are indexed and stored like other content. This ensures that blacklists can be used and that the searched word actually appears on the page that was discovered by the heuristic. The success of heuristics are marked with an image heuristic:&lt;name&gt; (new link) below the favicon left from the search result entry: The search result was discovered by a heuristic, but the link was already known by YaCy The search result was discovered by a heuristic, not previously known by YaCy 'site'-operator: instant shallow crawl When a search is made using a 'site'-operator (like: 'download site:yacy.net') then the host of the site-operator is instantly crawled with a host-restricted depth-1 crawl. That means: right after the search request the portal page of the host is loaded and every page that is linked on this page that points to a page on the same host. Because this 'instant crawl' must obey the robots.txt and a minimum access time for two consecutive pages, this heuristic is rather slow, but may discover all wanted search results using a second search (after a small pause of some seconds). search-result: shallow crawl on all displayed search results When a search is made then all displayed result links are crawled with a depth-1 crawl. This means: right after the search request every page is loaded and every page that is linked on this page. If you check 'add as global crawl job' the pages to be crawled are added to the global crawl queue (remote peers can pickup pages to be crawled). Default is to add the links to the local crawl queue (your peer crawls the linked pages). add as global crawl job opensearch load external search result list from active systems below When using this heuristic, then every new search request line is used for a call to listed opensearch systems. 20 results are taken from remote system and loaded simultanously, parsed and indexed immediately. To find out more about OpenSearch see Available/Active Opensearch System >Active< >Title< >Comment< Url <small>(format opensearch Url template syntax >delete< >new< "add" "Save" "reset to default list" "discover from index" class start background task, depending on index size this may run a long time With the button "discover from index" you can search within the metadata of your local index (Web Structure Index) to find systems which support the Opensearch specification. The task is started in the background. It may take some minutes before new entries appear (after refreshing the page). Alternatively you may >copy &amp; paste a example config file< located in <i>defaults/heuristicopensearch.conf</i> to the DATA/SETTINGS directory. For the discover function the <i>web graph</i> option of the web structure index and the fields <i>target_rel_s, target_protocol_s, target_urlstub_s</i> have to be switched on in the <a href="IndexSchema_p.html?core=webgraph">webgraph Solr schema</a>. "switch Solr fields on" ('modify Solr Schema') <!-- lang -->default(english) <!-- author --> <!-- maintainer --> Language selection You can change the language of the YaCy-webinterface with translation files. Current language</label> Author(s) (chronological)</label> Send additions to maintainer</em> Available Languages</label> Install new language from URL Use this language "Use" "Delete" "Install" Unable to get URL: Error saving the language file. Make sure that you only download data from trustworthy sources. The new language file might overwrite existing data if a file of the same name exists already. Download Language File Supported formats are the internal language file (extension .lng) or XLIFF (extension .xlf) format. Simple Editor to add untranslated text <html lang="en"> Network Configuration No changes were made! Accepted Changes Inapplicable Setting Combination For P2P operation, at least DHT distribution or DHT receive (or both) must be set. You have thus defined a Robinson configuration Global Search in P2P configuration is only allowed, if index receive is switched on. You have a P2P configuration, but are not allowed to search other peers. For Robinson Mode, index distribution and receive is switched off Network and Domain Specification YaCy can operate a computing grid of YaCy peers or as a stand-alone node. To control that all participants within a web indexing domain have access to the same domain, this network definition must be equal to all members of the same YaCy network. >Network Definition< Remote Network Definition URL Enter custom URL... Network Nick Long Description Indexing Domain "Change Network" Distributed Computing Network for Domain Enable Peer-to-Peer Mode to participate in the global YaCy network, or if you want your own separate search cluster with or without connection to the global network. Enable 'Robinson Mode' for a completely independent search engine instance, without any data exchange between your peer and other peers. Peer-to-Peer Mode >Index Distribution This enables automated, DHT-ruled Index Transmission to other peers >enabled disabled during crawling disabled during indexing >Index Receive Accept remote Index Transmissions This works only if you have a senior peer. The DHT-rules do not work without this function >reject accept transmitted URLs that match your blacklist >allow deny remote search >Robinson Mode If your peer runs in 'Robinson Mode' you run YaCy as a search engine for your own search portal without data exchange to other peers There is no index receive and no index distribution between your peer and any other peer >Private Peer Your search engine will not contact any other peer, and will reject every request >Public Cluster Your peer is part of a public cluster within the YaCy network Index data is not distributed, but remote crawl requests are distributed and accepted Search requests are spread over all peers of the cluster, and answered from all peers of the cluster List of .yacy or .yacyh - domains of the cluster: (comma-separated) >Public Peer You are visible to other peers and contact them to distribute your presence Your peer does not accept any outside index data, but responds on all remote search requests >Peer Tags When you allow access from the YaCy network, your data is recognized using keywords Please describe your search portal with some keywords (comma-separated) If you leave the field empty, no peer asks your peer. If you fill in a '*', your peer is always asked. "Save" Network Definition In case of Robinson-clustering there can be acceptance of remote crawl requests from peers of that cluster Parser Configuration Content Parser Settings With this settings you can activate or deactivate parsing of additional content-types based on their MIME-types. For a detailed description of the various MIME-types take a look at If you want to test a specific parser you can do so using the >File Viewer< >Extension< >Mime-Type< "Submit" Integration of a Search Portal If you like to integrate YaCy as portal for your web pages, you may want to change icons and messages on the search page. The search page may be customized. You can change the 'corporate identity'-images, the greeting line and a link to a home page that is reached when the 'corporate identity'-images are clicked. To change also colours and styles use the <a href="ConfigAppearance_p.html">Appearance Servlet</a> for different skins and languages. Greeting Line< URL of Home Page< URL of a Small Corporate Image< URL of a Large Corporate Image< Enable Search for Everyone? Search is available for everyone Only the administrator is allowed to search Snippet Fetch Strategy &amp; Link Verification Speed up search results with this option! (use CACHEONLY or FALSE to switch off verification) NOCACHE: no use of web cache, load all snippets online IFFRESH: use the cache if the cache exists and is fresh otherwise load online IFEXIST: use the cache if the cache exist or load online If verification fails, delete index reference CACHEONLY: never go online, use all content from cache. If no cache entry exist, consider content nevertheless as available and show result without snippet FALSE: no link verification and not snippet generation: all search results are valid without verification Greedy Learning Mode load documents linked in search results, will be deactivated automatically when index size Show Navigation Bar on Search Page? Show Navigation Top-Menu&nbsp; no link to YaCy Menu (admin must navigate to /Status.html manually) Show Advanced Search Options on Search Page? Show Advanced Search Options on index.html&nbsp; do not show Advanced Search Default Pop-Up Page< >Status Page >Search Front Page >Search Page (small header) >Interactive Search Page Default maximum number of results per page Default index.html Page (by forwarder) Target for Click on Search Results "_blank" (new window) "_self" (same window) "_parent" (the parent frame of a frameset) "_top" (top of all frames) "searchresult" (a default custom page name for search results) Special Target as Exception for an URL-Pattern Pattern:< >Exclude Hosts< List of hosts that shall be excluded from search results by default but can be included using the site:&lt;host&gt; operator: 'About' Column<br/>(shown in a column alongside<br/>with the search result page) (Headline) (Content) "Change Search Page" "Set to Default Values" You have to <a href="ConfigAccounts_p.html">set a remote user/password</a> to change this options. The search page can be integrated in your own web pages with an iframe. Simply use the following code: This would look like: For a search page with a small header, use this code: A third option is the interactive search. Use this code: You have set a remote user/password to change this options. Your Personal Profile You can create a personal profile here, which can be seen by other YaCy-members or <a href="ViewProfile.html?hash=localhash">in the public</a> using a <a href="ViewProfile.rdf?hash=localhash">FOAF RDF file</a>. >Name< Nick Name Homepage (appears on every <a href="Supporter.html">Supporter Page</a> as long as your peer is online) eMail Comment "Save" You can use < > here. Advanced Config Here are all configuration options from YaCy. You can change anything, but some options need a restart, and some options can crash YaCy, if wrong values are used. For explanation please look into defaults/yacy.init "Save" "Clear" Exclude Web-Spiders Here you can set up a robots.txt for all webcrawlers that try to access the webinterface of your peer. is a voluntary agreement most search-engines (including YaCy) follow. It disallows crawlers to access webpages or even entire domains. Deny access to Entire Peer Status page Network pages Surftips News pages Blog Public bookmarks Home Page File Share Impressum "Save restrictions" Wiki Integration of a Search Box We give information how to integrate a search box on any web page that Simply use the following code: MySearch "Search" This would look like: This does not use a style sheet file to make the integration into another web page with a different style sheet easier. You would need to change the following items: Replace the given colors #eeeeee (box background) and #cccccc (box border) Replace the word "MySearch" with your own message calls the normal YaCy search window. Search Page< >Search Result Page Layout Configuration< Below is a generic template of the search result page. Mark the check boxes for features you would like to be displayed. To change colors and styles use the >Appearance< menu for different skins. Other portal settings can be adjusted in <a href="ConfigPortal_p.html">Generic Search Portal</a> menu. >Page Template< >Text< >Images< >Audio< >Video< >Applications< >more options< >Tag< >Topics< >Cloud< >Protocol< >Filetype< >Wiki Name Space< >Language< >Author< >Vocabulary< >Provider< >Collection< >Title of Result< Description and text snippet of the search result 42 kbyte< >Metadata< >Parser< >Citation< >Pictures< >Cache< <html lang="en"> "Date" "Size" "Browse index" For this option URL proxy must be enabled. max. items "Save Settings" "Set Default Values" "Top navigation bar" >Location< show search results on map Date Navigation Maximum range (in days) Maximum days number in the histogram. Beware that a large value may trigger high CPU loads both on the server and on the browser with large result sets. keyword subject keyword2 keyword3 View via Proxy >JPG Snapshot< "Raw ranking score value" Ranking: 1.12195955E9 "Delete navigator" Add Navigators "Add navigator" >append http://url-of-the-search-result.net >System Update< Manual System Update Current installed Release Available Releases >changelog< > and < > RSS feed< (unsigned) (signed) "Download Release" "Check for new Release" Downloaded Releases No downloaded releases available for deployment. no&nbsp;automated installation on development environments "Install Release" "Delete Release" Automatic Update check for new releases, download if available and restart with downloaded release "Check + Download + Install Release Now" Download of release #[downloadedRelease]# finished. Restart Initiated. No more recent release found. Release will be installed. Please wait. You installed YaCy with a package manager. To update YaCy, use the package manager: Omitting update because this is a development environment. Omitting update because download of release #[downloadedRelease]# failed. Automated System Update manual update no automatic look-up, updates can be made manually using this interface (see options above) automatic update add the following line to updates are made within fixed cycles: Time between lookup hours Release blacklist regex on release number strings Release type only main releases any release including developer releases Signed autoupdate: only accept signed files "Submit" Accepted Changes. System Update Statistics Last System Lookup never Last Release Download Last Deploy Server Connection Tracking Incoming Connections Showing #[numActiveRunning]# active connections from a max. of #[numMax]# allowed incoming connections. Protocol</td> Duration Up-Bytes Source IP[:Port] Dest. IP[:Port] Command</td> Outgoing Connections Showing #[clientActive]# pooled outgoing connections used as: Connection Tracking Content Analysis These are document analysis attributes. Double Content Detection Double-Content detection is done using a ranking on a 'unique'-Field, named 'fuzzy_signature_unique_b'. This is the minimum length of a word which shall be considered as element of the signature. Should be either 2 or 3. The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the less words are used for the signature. For minTokenLen = 2 the quantRate value should not be below 0.24; for minTokenLen = 3 the quantRate value must be not below 0.5. "Set" "Re-Set to default" Content Control< Peer Content Control URL Filter With this settings you can activate or deactivate content control on this peer. Use content control filtering: >Enabled< Enables or disables content control. Use this table to create filter: Define a table. Default: Content Control SMW Import Settings With this settings you can define the content control import settings. You can define a Semantic Media Wiki with the appropriate extensions. SMW import to content control list: Enable or disable constant background synchronization of content control list from SMW (Semantic Mediawiki). Requires restart! SMW import base URL: Define base URL for SMW special page "Ask". Example: SMW import target table: Define import target table. Default: contentcontrol Purge content control list on initial sync: Purge content control list on initial synchronisation after startup. "Submit" Content Integration: Retrieval from phpBB3 Databases It is possible to extract texts directly from mySQL and postgreSQL databases. This interface gives you access to the phpBB3 forums software content. If you read from an imported database, here are some hints to get around problems when importing dumps in phpMyAdmin: before importing large database dumps, set the following Line in phpmyadmin/config.inc.php and place your dump file in /tmp (Otherwise it is not possible to upload files larger than 2MB) deselect the partial import flag When an export is started, surrogate files are generated into DATA/SURROGATE/in which are automatically fetched by an indexer thread. All indexed surrogate files are then moved to DATA/SURROGATE/out and can be re-cycled when an index is deleted. The URL stub like https://searchlab.eu this must be the path right in front of '/viewtopic.php?' Type > of database< use either 'mysql' or 'pgsql' <b>Host</b> of the database <b>Port</b> of database service<br />(usually 3306 for mySQL) <b>Name of the database</b> on the host <b>Table prefix string</b> for table names <b>User</b> that can access the database <b>Password</b> for the account of that user given above <b>Posts per file</b><br />in exported surrogates Check database connection Export Content to Surrogates Import a database dump Import Dump Posts in database first entry last entry Info failed: Export successful! Wrote #[files]# files in DATA/SURROGATES/in Export failed: Import successful! Import failed: Each extraction is specific to the data that is hosted in the database. in phpmyadmin/config.inc.php and place your dump file in /tmp (Otherwise it is not possible to upload files larger than 2MB) Host of database service usually 3306 for mySQL Name of the database on the host > of the database< Table prefix string for table names User that can access the database Password for the account of that user given above Posts per file in exported surrogates Incoming Cookies Monitor Cookie Monitor: Incoming Cookies This is a list of Cookies that a web server has sent to clients of the YaCy Proxy: Showing #[num]# entries from a total of #[total]# Cookies. Sending Host Date</td> Receiving Client >Cookie< "Enable Cookie Monitoring" "Disable Cookie Monitoring" Outgoing Cookies Monitor Cookie Monitor: Outgoing Cookies This is a list of cookies that browsers using the YaCy proxy sent to webservers: Showing #[num]# entries from a total of #[total]# Cookies. Receiving Host Date</td> Sending Client >Cookie< "Enable Cookie Monitoring" "Disable Cookie Monitoring" Cookie - Test Page Here is a cookie test page. Just clean it Name: Value: Dear server, set this cookie for me! Cookies at this browser: Cookies coming to server: Cookies server sent: YaCy is a GPL'ed project with the target of implementing a P2P-based global search engine. Architecture (C) by Crawl Check This pages gives you an analysis about the possible success for a web crawl on given addresses. List of possible crawl start URLs "Check given urls" >Analysis< >Access< >Robots< >Crawl-Delay< >Sitemap< Crawl Profile Editor >Crawler Steering< >Crawl Scheduler< >Scheduled Crawls can be modified in this table< Crawl profiles hold information about a crawl process that is currently ongoing. Crawl Profile List Crawl Thread Status >Depth</strong> Must Match Must Not Match Domain Counter Content Max Page Per Domain</strong> Accept Fill Proxy Cache Local Text Indexing Local Media Indexing Remote Indexing no::yes Running "Terminate" Finished "Delete" "Delete finished crawls" Select the profile to edit "Edit profile" An error occurred during editing the crawl profile: Edit Profile "Submit changes" Crawl Results< >Crawl Results Overview< These are monitoring pages for the different indexing queues. YaCy knows 5 different ways to acquire web indexes. The details of these processes (1-5) are described within the submenu's listed above which also will show you a table with indexing results so far. The information in these tables is considered as private, so you need to log-in with your administration password. Case (6) is a monitor of the local receipt-generator, the opposed case of (1). It contains also an indexing result monitor but is not considered private since it shows crawl requests from other peers. Case (7) occurs if surrogate files are imported The image above illustrates the data flow initiated by web index acquisition. Some processes occur double to document the complex index migration structure. (1) Results of Remote Crawl Receipts This is the list of web pages that this peer initiated to crawl, but had been crawled by <em>other</em> peers. This is the 'mirror'-case of process (6). <em>Use Case:</em> You get entries here, if you start a local crawl on the '<a href="CrawlStartExpert.html">Advanced Crawler</a>' page and check the 'Do Remote Indexing'-flag, and if you checked the 'Accept Remote Crawl Requests'-flag on the '<a href="RemoteCrawl_p.html">Remote Crawling</a>' page. Every page that a remote peer indexes upon this peer's request is reported back and can be monitored here. (2) Results for Result of Search Queries This index transfer was initiated by your peer by doing a search query. The index was crawled and contributed by other peers. <em>Use Case:</em> This list fills up if you do a search query on the 'Search Page' (3) Results for Index Transfer The url fetch was initiated and executed by other peers. These links here have been transmitted to you because your peer is the most appropriate for storage according to the logic of the Global Distributed Hash Table. <em>Use Case:</em> This list may fill if you check the 'Index Receive'-flag on the 'Index Control' page (4) Results for Proxy Indexing These web pages had been indexed as result of your proxy usage. No personal or protected page is indexed such pages are detected by Cookie-Use or POST-Parameters (either in URL or as HTTP protocol) and automatically excluded from indexing. <em>Use Case:</em> You must use YaCy as proxy to fill up this table. Set the proxy settings of your browser to the same port as given (5) Results for Local Crawling These web pages had been crawled by your own crawl task. <em>Use Case:</em> start a crawl by setting a crawl start point on the 'Index Create' page. (6) Results for Global Crawling These pages had been indexed by your peer, but the crawl was initiated by a remote peer. This is the 'mirror'-case of process (1). <em>Use Case:</em> This list may fill if you check the 'Accept Remote Crawl Requests'-flag on the '<a href="RemoteCrawl_p.html">Remote Crawling</a>' page The stack is empty. Statistics about #[domains]# domains in this stack: (7) Results from surrogates import These records had been imported from surrogate files in DATA/SURROGATES/in <em>Use Case:</em> place files with dublin core metadata content into DATA/SURROGATES/in or use an index import method (i.e. <a href="IndexImportMediawiki_p.html">MediaWiki import</a>, <a href="IndexImportOAIPMH_p.html">OAI-PMH retrieval</a>) >Domain "delete all" Showing all #[all]# entries in this stack. Showing latest #[count]# lines from a stack of #[all]# entries. "clear list" >Executor >Modified >Words >Title "delete" >Collection Blacklist to use "del & blacklist" on the 'Settings'-page in the 'Proxy and Administration Port' field. <html lang="en"> Expert Crawl Start Start Crawling Job: You can define URLs as start points for Web page crawling and start crawling here. "Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links. This is repeated as long as specified under "Crawling Depth". A crawl can also be started using wget and the >post arguments< > for this web page. Click on this API button to see a documentation of the POST request parameter for crawl starts. >Crawl Job< A Crawl Job consist of one or more start point, crawl limitations and document freshness rules. >Start Point< One Start URL or a list of URLs:<br/>(must start with http:// https:// ftp:// smb:// file://) Define the start-url(s) here. You can submit more than one URL, each line one URL please. Each of these URLs are the root for a crawl start, existing start URLs are always re-loaded. >From Link-List of URL< From Sitemap From File (enter a path<br/>within your local file system) Other already visited URLs are sorted out as "double", if they are not allowed using the re-crawl option. A web crawl performs a double-check on all links found in the internet against the internal database. If the same url is found again, then the url is treated as double when you check the 'no doubles' option. A url may be loaded again when it has reached a specific age, Use filter Restrict to start domain(s) Restrict to sub-path(s) Example: to allow only urls that contain the word 'science', set the must-match filter to '.*science.*'. You can also use an automatic domain-restriction to fully crawl a single domain. Attention: you can test the functionality of your regular expressions using the <a href="RegexTest.html">Regular Expression Tester</a> within YaCy</a>. You can limit the maximum number of pages that are fetched and indexed from a single domain with this option. You can combine this limitation with the 'Auto-Dom-Filter', so that the limit is applied to all the domains within the given depth. Domains outside the given depth are then sorted-out anyway. Document Cache< Store to Web Cache This option is used by default for proxy prefetch, but is not needed for explicit crawling. A questionmark is usually a hint for a dynamic page. URLs pointing to dynamic content should usually not be crawled. However, there are sometimes web pages with static content that is accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops. Accept URLs with query-part ('?'): Obey html-robots-noindex: Policy for usage of Web Cache The caching policy states when to use the cache during crawling: no&nbsp;cache if&nbsp;fresh if&nbsp;exist cache&nbsp;only never use the cache, all content from fresh internet source; use the cache if the cache exists and is fresh using the proxy-fresh rules; use the cache if the cache exist. Do no check freshness. Otherwise use online source; never go online, use all content from cache. If no cache exist, treat content as unavailable >Snapshot Creation< Max Depth for Snapshots Multiple Snapshot Versions replace old snapshots with new one add new versions for each crawl Snapshots are xml metadata and pictures of web pages that can be created during crawling time. The xml data is stored in the same way as a Solr search result with one hit and the pictures will be stored as pdf into subdirectories of HTCACHE/snapshots/. From the pdfs the jpg thumbnails are computed. Snapshot generation can be controlled using a depth parameter; that means a snapshot is only be generated if the crawl depth of a document is smaller or equal to the given number here. If the number is set to -1, no snapshots are generated. >Crawler Filter< These are limitations on the crawl stacker. The filters will be applied before a web page is loaded. Crawling Depth< This defines how often the Crawler will follow links (of links..) embedded in websites. 0 means that only the page you enter under "Starting Point" will be added to the index. 2-4 is good for normal indexing. Values over 8 are not useful, since a depth-8 crawl will index approximately 25.600.000.000 pages, maybe this is the whole WWW. also all linked non-parsable documents Unlimited crawl depth for URLs matching with Maximum Pages per Domain >Use< >Page-Count< misc. Constraints >Load Filter on URLs< >Load Filter on IPs< Must-Match List for Country Codes Crawls can be restricted to specific countries. This uses the country code that can be computed from the IP of the server that hosts the page. The filter is not a regular expressions but a list of country codes, separated by comma. no country code restriction Filter on URLs Document Filter These are limitations on index feeder. The filters will be applied after a web page was loaded. >Filter on URLs< The filter is a >regular expression< that <b>must not match</b> with the URLs to allow that the content of the url is indexed. > must-match< > must-not-match< (must not be empty) Filter on Content of Document<br/>(all visible text, including camel-case-tokenized url and title) Clean-Up before Crawl Start >No Deletion< >Re-load< For each host in the start url list, delete all documents (in the given subpath) from that host. Delete sub-path Delete only old Do not delete any document before the crawl is started. Treat documents that are loaded > ago as stale and delete them before the crawl is started. After a crawl was done in the past, document may become stale and eventually they are also deleted on the target host. To remove old files from the search index it is not sufficient to just consider them for re-load but it may be necessary to delete them because they simply do not exist any more. Use this in combination with re-crawl while this time should be longer. Double-Check Rules No&nbsp;Doubles to use that check the 're-load' option. > ago as stale and load them again. If they are younger, they are ignored. Never load any page that is already known. Only the start-url may be loaded again. Robot Behaviour Use Special User Agent and robot identification You are running YaCy in non-p2p mode and because YaCy can be used as replacement for commercial search appliances (like the GSA) the user must be able to crawl all web pages that are granted to such commercial plattforms. Not having this option would be a strong handicap for professional usage of this software. Therefore you are able to select alternative user agents here which have different crawl timings and also identify itself with another user agent and obey the corresponding robots rule. index text index media This enables indexing of the webpages the crawler will download. This should be switched on by default, unless you want to crawl only to fill the Document Cache without indexing. Do Remote Indexing Describe your intention to start this global crawl (optional) This message will appear in the 'Other Peer Crawl Start' table of other peers. If checked, the crawler will contact other peers and use them as remote indexers for your crawl. If you need your crawling results locally, you should switch this off. Only senior and principal peers can initiate or receive remote crawls. A YaCyNews message will be created to inform all peers about a global crawl so they can omit starting a crawl with the same start point. Add Crawl result to collection(s) A crawl result can be tagged with names which are candidates for a collection request. These tags can be selected with the GSA interface using the 'site' operator. To use this option, the 'collection_sxt'-field must be switched on in the Solr Schema "Start New Crawl Job" Restrict to start domain Restrict to sub-path Following frames is NOT done by Gxxg1e, but we do by default to have a richer content. 'nofollow' in robots metadata can be overridden; this does not affect obeying of the robots.txt which is never ignored. Network Scanner YaCy can scan a network segment for available http, ftp and smb server. You must first select a IP range and then, after this range is scanned, it is possible to select servers that had been found for a full-site crawl. No servers had been detected in the given IP range Please enter a different IP range for another scan. Please wait... >Scan the network< Scan Range Scan sub-range with given host Full Intranet Scan: Do not use intranet scan results, you are not in an intranet environment! All known hosts in the search index (/31 subnet recommended!) only the given host(s) addresses) Subnet< Time-Out< >Scan Cache< accumulate scan results with access type "granted" into scan cache (do not delete old scan result) >Service Type< >Scheduler< run only a scan scan and add all sites with granted access automatically. This disables the scan cache accumulation. Look every >minutes< >hours< >days< again and add new sites automatically to indexer. Sites that do not appear during a scheduled scan period will be excluded from search results. "Scan" YaCy '#[clientname]#': Crawl Start >Site Crawling< Site Crawler: Download all web pages from a given domain or base URL. >Site Crawl Start< >Site< Start URL&nbsp;(must start with Link-List of URL Sitemap URL >Path< load all files in domain load only files in a sub-path of given url >Limitation< not more than < >documents< Collection< >Start< "Start New Crawl" Hints< >Crawl Speed Limitation< No more that two pages are loaded from the same host in one second (not more that 120 document per minute) to limit the load on the target server. >Target Balancer< A second crawl for a different host increases the throughput to a maximum of 240 documents per minute since the crawler balances the load over all hosts. >High Speed Crawling< A 'shallow crawl' which is not limited to a single host (or site) can extend the pages per minute (ppm) rate to unlimited documents per minute when the number of target hosts is high. This can be done using the <a href="CrawlStartExpert.html">Expert Crawl Start</a> servlet. >Scheduler Steering< The scheduler on crawls can be changed or removed using the <a href="Table_API_p.html">API Steering</a>. Click on this API button to see an XML with information about the crawler status >Crawler< >Queues< >Queue< Crawler PPM Error with profile management. Please stop YaCy, delete the file DATA/PLASMADB/crawlProfiles0.db and restart. Error: Application not yet initialized. Sorry. Please wait some seconds and repeat ERROR: Crawl filter does not match with crawl root Please try again with different filter. :: Crawling of failed. Reason: Error with URL input Error with file input started. Please wait some seconds, it may take some seconds until the first result appears there. If you crawl any un-wanted pages, you can delete them <a href="IndexCreateQueues_p.html?stack=LOCAL">here</a>.<br /> >Size >Progress< "set" Loader >Index Size< Seg-<br/>ments >Documents< >solr search api< >Webgraph Edges< Citations<br/>(reverse link index) RWIs<br/>(P2P Chunks) Local Crawler Limit Crawler Remote Crawler No-Load Crawler Speed / PPM<br/>(Pages Per Minute) Database Entries Indicator Level Postprocessing Progress Traffic (Crawler) >Load< pending: >Running Crawls Name Status Running Terminate All Confirm Termination of All Crawls "Terminate" Crawled Pages Load< Knowledge Loader YaCy can use external libraries to enable or enhance some functions. These libraries are not included in the main release of YaCy because they would increase the application file too much. You can download additional files here. >Geolocalization< Geolocalization will enable YaCy to present locations from OpenStreetMap according to given search words. >GeoNames< With this file it is possible to find cities all over the world. Content< cities with a population &gt; 1000 all over the world cities with a population &gt; 5000 all over the world cities with a population &gt; 100000 all over the world (the set is is reduced to cities &gt; 100000) >Download from< >Storage location< >Status< >not loaded< >loaded< :deactivated >Action< >Result< "Load" "Deactivate" "Remove" "Activate" >loaded and activated dictionary file< >loading of dictionary file failed: #[error]#< >deactivated and removed dictionary file< >cannot remove dictionary file: #[error]#< >deactivated dictionary file< >cannot deactivate dictionary file: #[error]#< >activated dictionary file< >cannot activate dictionary file: #[error]#< >With this file it is possible to find locations in Germany using the location (city) name, a zip code, a car sign or a telephone pre-dial number.< Suggestions< Suggestion dictionaries will help YaCy to provide better suggestions during the input of search words This file provides 100000 most common german words for suggestions >Tutorial You are using the administration interface of your own search engine You can create your own search index with YaCy To learn how to do that, watch one of the demonstration videos below twitter this video Download from Vimeo More Tutorials Please see the tutorials on YaCy: Tutorial Index Browser Browse the index of #[ucount]# documents. Enter a host or an URL for a file list or view a list of >all hosts< >only hosts with urls pending in the crawler< > or < >only with load errors< Host/URL Browse Host "Delete Subpath" Browser for "Re-load load-failure docs (404s etc)" Confirm Deletion >Host List< Count Colors: Documents without Errors Pending in Crawler Crawler Excludes< Load Errors< documents stored for host: #[hostsize]# documents stored for subpath: #[subpathloadsize]# unloaded documents detected in subpath: #[subpathdetectedsize]# >Path< >stored< >linked< >pending< >excluded< >failed< Show Metadata link, detected from context load &amp; index >indexed< >loading< Outbound Links, outgoing from #[host]# - Host List Inbound Links, incoming to #[host]# - Host List <html lang="en"> 'number of documents about this date' "show link structure graph" Host has load error(s) Administration Options Delete all >Load Errors< from index "Delete Load Errors" Index Cleaner >URL-DB-Cleaner Total URLs searched: Blacklisted URLs found: Percentage blacklisted: last searched URL: last blacklisted URL found: >RWI-DB-Cleaner RWIs at Start: RWIs now: wordHash in Progress: last wordHash with deleted URLs: Number of deleted URLs in on this Hash: URL-DB-Cleaner - Clean up the database by deletion of blacklisted urls: Start/Resume Stop Pause RWI-DB-Cleaner - Clean up the database by deletion of words with reference to blacklisted urls: Reverse Word Index Administration The local index currently contains #[wcount]# reverse word indexes RWI Retrieval (= search for a single word) Retrieve by Word:< "Show URL Entries for Word" Retrieve by Word-Hash "Show URL Entries for Word-Hash" "Generate List" Limitations Index Reference Size No reference size limitation (this may cause strong CPU load when words are searched that appear very often) Limitation of number of references per word: (this causes that old references are deleted if that limit is reached) >Set References Limit< No entry for word '#[word]#' No entry for word hash Search result total URLs</td> appearance in</td> in link type</td> document type</td> <td>description</td> <td>title</td> <td>creator</td> <td>subject</td> <td>url</td> <td>emphasized</td> <td>image</td> <td>audio</td> <td>video</td> <td>app</td> index of</td> >Selection</td> Display URL List Number of lines all lines "List Selected URLs" Transfer RWI to other Peer Transfer by Word-Hash "Transfer to other peer" to Peer <dd>select or enter a hash or peer name: Sequential List of Word-Hashes No URL entries related to this word hash >#[count]# URL entries related to this word hash Resource</td> Negative Ranking Factors Positive Ranking Factors Reverse Normalized Weighted Ranking Sum hash</td> dom length</td> url length</td> pos in text</td> pos of phrase</td> pos in phrase</td> <td>authority</td> <td>date</td> words in title</td> words in text</td> local links</td> remote links</td> hitcount</td> unresolved URL Hash Word Deletion Deletion of selected URLs delete also the referenced URL (recommended, may produce unresolved references at other word indexes but they do not harm) for every resolvable and deleted URL reference, delete the same reference at every other word where the reference exists (very extensive, but prevents further unresolved references) "Delete reference to selected URLs" "Delete Word" Blacklist Extension "Add selected URLs to blacklist" "Add selected domains to blacklist" These document details can be retrieved as <a href="http://www.w3.org/TR/xhtml-rdfa-primer/" target="_blank">XHTML+RDFa</a> document containg <a href="http://www.w3.org/RDF/" target="_blank">RDF</a> annotations in <a href="http://dublincore.org/" target="_blank">Dublin Core</a> vocabulary. The XHTML+RDFa data format is both a XML content format and a HTML display format and is considered as an important <a href="http://www.w3.org/2001/sw/" target="_blank">Semantic Web</a> content format. The same content can also be retrieved as pure <a href="api/yacydoc.xml?urlhash=#[urlhash]#">XML metadata</a> with DC tag name vocabulary. Click the API icon to see an example call to the search rss API. To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>. URL Database Administration The local index currently contains #[ucount]# URL references URL Retrieval Retrieve by URL:< "Show Details for URL" Retrieve by URL-Hash "Show Details for URL-Hash" Cleanup Index Deletion Delete local search index (embedded Solr and old Metadata) Delete remote solr index Delete RWI Index (DHT transmission words) Delete Citation Index (linking between URLs) Delete First-Seen Date Table Delete HTTP &amp; FTP Cache Stop Crawler and delete Crawl Queues Delete robots.txt Cache "Delete" Confirm Deletion Statistics about top-domains in URL Database Show top domains from all URLs. "Generate Statistics" Statistics about the top-#[domains]# domains in the database: "delete all" >Domain< >Optimize Solr< merge to max. < > segments "Optimize Solr" Reboot Solr Core "Shut Down and Re-Start Solr" query No entry found for URL-hash "Show Content" "Delete URL" this may produce unresolved references at other word indexes but they do not harm "Delete URL and remove all references from words" Optimize Solr delete the reference to this url at every other word where the reference exists (very extensive, but prevents unresolved references) Loader Queue The loader set is empty There are #[num]# entries in the loader set: Initiator Depth Status Parser Errors Rejected URLs There are #[num]# entries in the rejected-urls list. Showing latest #[num]# entries. "show more" "clear list" Time Fail-Reason Rejected URL List: There are #[num]# entries in the rejected-queue: This crawler queue is empty Click on this API button to see an XML with information about the crawler latency and other statistics. Delete Entries: Initiator Profile Depth Modified Date Anchor Name Count Delta/ms Host "Delete" Crawl Queue< >Count< >Initiator< >Profile< >Depth< Index Deletion< The search index contains #[doccount]# documents. You can delete them here. Deletions are made concurrently which can cause that recently deleted documents are not yet reflected in the document count. Delete by URL Matching< Delete all documents within a sub-path of the given urls. That means all documents must start with one of the url stubs as given here. One URL stub, a list of URL stubs<br/>or a regular expression Matching Method< sub-path of given URLs matching with regular expression "Simulate Deletion" "no actual deletion, generates only a deletion count" "Engage Deletion" "simulate a deletion first to calculate the deletion count" "engaged" selected #[count]# documents for deletion deleted #[count]# documents Delete by Age< Delete all documents which are older than a given time period. Time Period< All documents older than years< months< days< hours< Age Identification< >load date >last-modified Delete Collections< Delete all documents which are inside specific collections. Not Assigned< Delete all documents which are not assigned to any collection , separated by ',' (comma) or '|' (vertical bar); or >generate the collection list... Assigned< Delete all documents which are assigned to the following collection(s) Delete by Solr Query< This is the most generic option: select a set of documents using a solr query. The local index currently contains #[ucount]# documents. Loaded URL Export Export Path URL Filter >query< maximum age (seconds, -1 = unlimited) Export Format Full Data Records: (Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/) (Rich and full-text Elasticsearch data, one document per line in one flat JSON file, can be bulk-imported to elasticsearch with the command "curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary @yacy_dump_XXX.flatjson") Full URL List: Plain Text List (URLs only) HTML (URLs with title) Only Domain: Plain Text List (domains only) HTML (domains as URLs, no title) >Only Text: Fulltext of Search Index Text Export to file #[exportfile]# is running .. #[urlcount]# Documents so far Finished export of #[urlcount]# Documents to file Import this file by moving it to DATA/SURROGATES/in Export to file #[exportfile]# failed: Dump and Restore of Solr Index "Create Dump" Dump File "Restore Dump" Stored a solr dump to file Index Sources &amp; Targets YaCy supports multiple index storage locations. As an internal indexing database a deep-embedded multi-core Solr is used and it is possible to attach also a remote Solr. Solr Search Index Solr stores the main search index. It is the home of two cores, the default 'collection1' core for documents and the 'webgraph' core for a web structure graph. Detailed information about the used Solr fields can be edited in the <a href="IndexSchema_p.html">Schema Editor</a>. Lazy Value Initialization&nbsp; If checked, only non-zero values and non-empty strings are written to Solr fields. Use deep-embedded local Solr&nbsp; This will write the YaCy-embedded Solr index which stored within the YaCy DATA directory. The Solr native search interface is accessible at<br/> <a href="solr/select?q=*:*&start=0&rows=3&core=collection1">/solr/select?q=*:*&amp;start=0&amp;rows=3&amp;core=collection1</a> for the default search index (core: collection1) and at<br/> <a href="solr/select?q=*:*&start=0&rows=3&core=webgraph">/solr/select?q=*:*&amp;start=0&amp;rows=3&amp;core=webgraph</a> for the webgraph core.<br/> If you switch off this index, a remote Solr must be activated. Use remote Solr server(s) Solr Hosts Solr Host Administration Interface Index Size It's easy to <a href="http://www.yacy-websearch.net/wiki/index.php/Dev:Solr" target="_blank">attach an external Solr to YaCy</a>. This external Solr can be used instead the internal Solr. It can also be used additionally to the internal Solr, then both Solr indexes are mirrored. Solr URL(s) You can set one or more Solr targets here which are accessed as a shard. For several targets, list them using a ',' (comma) as separator. The set of remote targets are used as shard of a complete index. The host part of the url is used as key for a hash function which selects one of the shards (one of your remote servers). When a search request is made, all servers are accessed synchronously and the result is combined. Sharding Method<br/> write-enabled (if unchecked, the remote server(s) will only be used as search peers) Web Structure Index The web structure index is used for host browsing (to discover the internal file/folder structure), ranking (counting the number of references) and file search (there are about fourty times more links from loaded pages as in documents of the main search index). use citation reference index (lightweight and fast) use webgraph search index (rich information in second Solr core) "Set" Peer-to-Peer Operation The 'RWI' (Reverse Word Index) is necessary for index transmission in distributed mode. For portal or intranet mode this must be switched off. support peer-to-peer index transmission (DHT RWI index) MediaWiki Dump Import No import thread is running, you can start a new thread here Bad input data: MediaWiki Dump File Selection: select an XML file (which may be bz2- or gz-encoded) You can import <a href="https://dumps.wikimedia.org/backup-index-bydb.html" target="_blank">MediaWiki dumps</a> here. An example is the file "Import MediaWiki Dump" When the import is started, the following happens: The dump is extracted on the fly and wiki entries are translated into Dublin Core data format. The output looks like this: Each 10000 wiki records are combined in one output file which is written to /DATA/SURROGATES/in into a temporary file. When each of the generated output file is finished, it is renamed to a .xml file Each time a xml surrogate file appears in /DATA/SURROGATES/in, the YaCy indexer fetches the file and indexes the record entries. When a surrogate file is finished with indexing, it is moved to /DATA/SURROGATES/out You can recycle processed surrogate files by moving them from /DATA/SURROGATES/out to /DATA/SURROGATES/in Import Process Thread: Dump: Processed: Wiki Entries Speed: articles per second< Running Time: hours, minutes< Remaining Time: List of #[num]# OAI-PMH Servers "Load Selected Sources" OAI-PMH source import list >Source< Import List >Thread< >Processed<br />Chunks< >Imported<br />Records< >Speed<br />(records/second) Complete at OAI-PMH Import Results from the import can be monitored in the <a href="CrawlResults.html?process=7">indexing results for surrogates Single request import This will submit only a single request as given here to a OAI-PMH server and imports records into the index "Import OAI-PMH source" Source: Processed: records< ResumptionToken: Import failed: Import all Records from a server Import all records that follow according to resumption elements into index "import this source" ::or&nbsp; "import from a list" Import started! Bad input data: Warc Import Web Archive File Import No import thread is running, you can start a new thread here Warc File Selection: select an warc file (which may be gz compressed) You can download warc archives for example here Internet Archive Import Warc File Import Process Thread: Warc File: Processed: Entries Speed: pages per second Running Time: hours, minutes< Remaining Time: Field Re-Indexing< In case that an index schema of the embedded/local index has changed, all documents with missing field entries can be indexed again with a reindex job. "refresh page" Documents in current queue< Documents processed< current select query "start reindex job now" "stop reindexing" Remaining field list reindex documents containing these fields: Re-Crawl Index Documents Searches the local index and selects documents to add to the crawler (recrawl the document). This runs transparent as background job. Documents are added to the crawler only if no other crawls are active and are added in small chunks. "start recrawl job now" "stop recrawl job" Re-Crawl Query Details Documents to process Current Query Edit Solr Query update to re-crawl documents selected with the given query. Include failed URLs >Field< >count< Re-crawl works only with an embedded local Solr index! Simulate Check only how many documents would be selected for recrawl "Browse metadata of the #[rows]# first selected documents" document(s)</a>#(/showSelectLink)# selected for recrawl. >Solr query < Set defaults "Reset to default values" Last #(/jobStatus)#Re-Crawl job report Automatically refreshing An error occurred while trying to refresh automatically The job terminated early due to an error when requesting the Solr index. >Status< "Running" "Shutdown in progress" "Terminated" Running::Shutdown in progress::Terminated >Query< >Start time< >End time< URLs added to the crawler queue for recrawl >Recrawled URLs< URLs rejected for some reason by the crawl stacker or the crawler queue. Please check the logs for more details. >Rejected URLs< >Malformed URLs< "#[malformedUrlsDeletedCount]# deleted from the index" > Refresh< Solr Schema Editor If you use a custom Solr schema you may enter a different field name in the column 'Custom Solr Field Name' of the YaCy default attribute name Select a core: the core can be searched at Active Attribute Custom Solr Field Name Comment show active show all available show disabled "Set" "reset selection to default" >Reindex documents< If you unselected some fields, old documents in the index still contain the unselected fields. To physically remove them from the index you need to reindex the documents. Here you can reindex all documents with inactive fields. "reindex Solr" You may monitor progress (or stop the job) under <a href="IndexReIndexMonitor_p.html">IndexReIndexMonitor_p.html</a> YaCy '#[clientname]#': Configuration of a Wiki Search Integration in MediaWiki It is possible to insert wiki pages into the YaCy index using a web crawl on that pages. This guide helps you to crawl your wiki and to insert a search window in your wiki pages. Retrieval of Wiki Pages The following form is a simplified crawl start that uses the proper values for a wiki crawl. Just insert the front page URL of your wiki. After you started the crawl you may want to get back to this page to read the integration hints below. URL of the wiki main page This is a crawl start point "Get content of Wiki: crawl wiki pages" Inserting a Search Window to MediaWiki To integrate a search window into a MediaWiki, you must insert some code into the wiki template. There are several templates that can be used for MediaWiki, but in this guide we consider that you are using the default template, 'MonoBook.php': open skins/MonoBook.php find the line where the default search window is displayed, there are the following statements: Remove that code or set it in comments using '&lt;!--' and '--&gt;' Insert the following code: Search with YaCy in this Wiki: value="Search" Check all appearances of static IPs given in the code snippet and replace it with your own IP, or your host name You may want to change the default text elements in the code snippet To see all options for the search widget, look at the more generic description of search widgets at the <a href="ConfigLiveSearch.html">configuration for live search</a>. Configuration of a phpBB3 Search Integration in phpBB3 It is possible to insert forum pages into the YaCy index using a database import of forum postings. This guide helps you to insert a search window in your phpBB3 pages. Retrieval of phpBB3 Forum Pages using a database export Forum posting contain rich information about the topic, the time, the subject and the author. This information is in an bad annotated form in web pages delivered by the forum software. It is much better to retrieve the forum postings directly from the database. This will cause that YaCy is able to offer nice navigation features after searches. YaCy has a phpBB3 extraction feature, please go to the <a href="ContentIntegrationPHPBB3_p.html">phpBB3 content integration</a> servlet for direct database imports. Retrieval of phpBB3 Forum Pages using a web crawl The following form is a simplified crawl start that uses the proper values for a phpbb3 forum crawl. Just insert the front page URL of your forum. After you started the crawl you may want to get back to this page to read the integration hints below. URL of the phpBB3 forum main page This is a crawl start point "Get content of phpBB3: crawl forum pages" Inserting a Search Window to phpBB3 To integrate a search window into phpBB3, you must insert some code into a forum template. There are several templates that can be used for phpBB3, but in this guide we consider that you are using the default template, 'prosilver' open styles/prosilver/template/overall_header.html find the line where the default search window is displayed, thats right behind the <pre>&lt;div id="search-box"&gt;</pre> statement Insert the following code right behind the div tag YaCy Forum Search ;YaCy Search Check all appearances of static IPs given in the code snippet and replace it with your own IP, or your host name You may want to change the default text elements in the code snippet To see all options for the search widget, look at the more generic description of search widgets at the <a href="ConfigLiveSearch.html">configuration for live search</a>. Configuration of a RSS Search Loading of RSS Feeds< RSS feeds can be loaded into the YaCy search index. This does not load the rss file as such into the index but all the messages inside the RSS feeds as individual documents. URL of the RSS feed >Preview< "Show RSS Items" Indexing Available after successful loading of rss feed in preview "Add All Items to Index (full content of url)" >once< >load this feed once now< >scheduled< >repeat the feed loading every< >minutes< >hours< >days< > automatically. >List of Scheduled RSS Feed Load Targets< >Title< >URL/Referrer< >Recording< >Last Load< >Next Load< >Last Count< >All Count< >Avg. Update/Day< "Remove Selected Feeds from Scheduler" "Remove All Feeds from Scheduler" >Available RSS Feed List< "Remove Selected Feeds from Feed List" "Remove All Feeds from Feed List" "Add Selected Feeds to Scheduler" >new< >enqueued< >indexed< >RSS Feed of >Author< >Description< >Language< >Date< >Time-to-live< >Docs< >State< "Add Selected Items to Index (full content of url)" Send message You cannot send a message to The peer does not respond. It was now removed from the peer-list. The peer <b> is alive and responded: You are allowed to send me a message kb and an attachment &le; Your Message Subject: Text: "Enter" "Preview" You can use Wiki Code</a> here. Preview message The message has not been sent yet! The peer is alive but cannot respond. Sorry. Your message has been sent. The target peer responded: The target peer is alive but did not receive your message. Sorry. Here is a copy of your message, so you can copy it to save it for further attempts: >Messages Date</td> From</td> To</td> >Subject Action From: To: Date: >view reply >delete Compose Message Send message to peer "Compose" Message: inbox YaCy Search Network YaCy Network< The information that is presented on this page can also be retrieved as XML. Click the API icon to see the XML. To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>. Network Overview Active&nbsp;Principal&nbsp;and&nbsp;Senior&nbsp;Peers Passive&nbsp;Senior&nbsp;Peers Junior&nbsp;(fragment)&nbsp;Peers Network History <b>Count of Connected Senior Peers</b> in the last two days, scale = 1h <b>Count of all Active Peers Per Day</b> in the last week, scale = 1d <b>Count of all Active Peers Per Week</b> in the last 30d, scale = 7d <b>Count of all Active Peers Per Month</b> in the last 365d, scale = 30d Active Principal and Senior Peers in '#[networkName]#' Network Passive Senior Peers in '#[networkName]#' Network Junior Peers (a fragment) in '#[networkName]#' Network Manually contacting Peer no remote #[peertype]# peer for this list known Showing #[num]# entries from a total of #[total]# peers. send&nbsp;<strong>M</strong>essage/<br/>show&nbsp;<strong>P</strong>rofile/<br/>edit&nbsp;<strong>W</strong>iki/<br/>browse&nbsp;<strong>B</strong>log Search for a peername (RegExp allowed) "Search" Name Address Hash Type Release< Last<br/>Seen Location Offset Send message to peer View profile of peer Read and edit wiki on peer Browse blog of peer "DHT Receive: yes" "DHT receive enabled" "DHT Receive: no; #[peertags]#" "DHT Receive: no" "no DHT receive" "Accept Crawl: no" "no crawl" "Accept Crawl: yes" "crawl possible" Contact: passive Contact: direct Seed download: possible runtime: >Network< >Online Peers< >Number of<br/>Documents< Indexing Speed: Pages Per Minute (PPM) Query Frequency: Queries Per Hour (QPH) >Today< >Last&nbsp;Week< >Last&nbsp;Month< Last Hour >Now< >Active Senior< >Passive Senior< >Junior (fragment)< >This Peer< URLs for<br/>Remote Crawl "The YaCy Network" Indexing<br/>PPM (public&nbsp;local) (remote) Your Peer: >Name< >Info< >Version< >UTC< >Uptime< >Links< Sent<br/>URLs Sent<br/>DHT Word Chunks Received<br/>URLs Received<br/>DHT Word Chunks Known<br/>Seeds Connects<br/>per hour >dark green font< senior/principal peers >light green font< >passive peers< >pink font< junior peers red point this peer >grey waves< >crawling activity< >green radiation< >strong query activity< >red lines< >DHT-out< >green lines< >DHT-in< Count of Connected Senior Peers in the last two days, scale = 1h Count of all Active Peers Per Day in the last week, scale = 1d Count of all Active Peers Per Week in the last 30d, scale = 7d Count of all Active Peers Per Month in the last 365d, scale = 30d Overview Incoming&nbsp;News Processed&nbsp;News Outgoing&nbsp;News Published&nbsp;News This is the YaCyNews system (currently under testing). The news service is controlled by several entry points: A crawl start with activated remote indexing will automatically create a news entry. Other peers may use this information to prevent double-crawls from the same start point. A table with recently started crawls is presented on the Index Create - page A change in the personal profile will create a news entry. You can see recently made changes of profile entries on the Network page, where that profile change is visualized with a '*' beside the 'P' (profile) - selector. Publishing of added or modified translation for the user interface. Other peers may include it in their local translation list. To publish a translation, use the integrated translation editor to add a translation and publish it afterwards. Above you can see four menues: <strong>Incoming News (#[insize]#)</strong>: latest news that arrived your peer. Only these news will be used to display specific news services as explained above. You can process these news with a button on the page to remove their appearance from the IndexCreate and Network page <strong>Processed News (#[prsize]#)</strong>: this is simply an archive of incoming news that you removed by processing. <strong>Outgoing News (#[ousize]#)</strong>: here your can see news entries that you have created. These news are currently broadcasted to other peers. you can stop the broadcast if you want. <strong>Published News (#[pusize]#)</strong>: your news that have been broadcasted sufficiently or that you have removed from the broadcast list. Originator Created Category Received Distributed Attributes Process Selected News Delete Selected News Abort Publication of Selected News Process All News Delete All News Abort Publication of All News "#(page)#::Process Selected News::Delete Selected News::Abort Publication of Selected News::Delete Selected News#(/page)#" "#(page)#::Process All News::Delete All News::Abort Publication of All News::Delete All News#(/page)#" More news services will follow. Performance of Concurrent Processes serverProcessor Objects Queue Size<br />Current Queue Size<br />Maximum Executors:<br />Current Number of Threads Concurrency:<br />Maximum Number of Threads Childs Average<br />Block Time<br />Reading Average<br />Exec Time Average<br />Block Time<br />Writing Total<br />Cycles Full Description Performance Settings for Memory refresh graph simulate short memory status use Standard Memory Strategy</label> (current: #[memoryStrategy]#) Memory Usage After Startup After Initializations before GC after GC >Now before < Description maximum memory that the JVM will attempt to use >Available< total available memory including free for the JVM within maximum >Max< >Total< total memory taken from the OS >Free< free memory in the JVM within total amount >Used< used memory in the JVM within total amount Solr Resources >Class< >Type< >Statistics< >Size< Table RAM Index >Key >Value Table</td> Chunk Size< Used Memory< Object Index Caches Needed Memory Object Read Caches >Read Hit Cache< >Read Miss Cache< >Read Hit< >Read Miss< Write Unique< Write Double< Deletes< Flushes< Total Mem MB (hit) MB (miss) Stop Grow when less than #[objectCacheStopGrow]# MB available left Start Shrink when less than #[objectCacheStartShrink]# MB availabe left Other Caching Structures >Hit< >Miss< Insert< Delete< Search Event Cache< Performance Settings of Queues and Processes Scheduled tasks overview and waiting time settings: >Thread< Queue Size >Total Cycles Block Time Sleep Time Exec Time <td>Idle >Busy Short Mem<br />Cycles >per Cycle >per Busy-Cycle >Memory Use >Delay between >idle loops >busy loops Minimum of<br />Required Memory Maximum of<br />System-Load Full Description Submit New Delay Values Re-set to default Changes take effect immediately Cache Settings: RAM Cache <td>Description Words in RAM cache: (Size in KBytes) This is the current size of the word caches. The indexing cache speeds up the indexing process, the DHT cache holds indexes temporary for approval. The maximum of this caches can be set below. Maximum URLs currently assigned<br />to one cached word: This is the maximum size of URLs assigned to a single word cache entry. If this is a big number, it shows that the caching works efficiently. Maximum age of a word: This is the maximum age of a word in an index in minutes. Minimum age of a word: This is the minimum age of a word in an index in minutes. Maximum number of words in cache: This is is the number of word indexes that shall be held in the ram cache during indexing. When YaCy is shut down, this cache must be flushed to disc; this may last some minutes. Enter New Cache Size Thread Pool Settings: Thread Pool maximum Active current Active Enter new Threadpool Configuration milliseconds< kbytes< load< Performance Settings of Search Sequence Search Sequence Timing Timing results of latest search request: Query Event< Comment< Time< Duration (ms) Result-Count The network picture below shows how the latest search query was solved by asking corresponding peers in the DHT: red -&gt; request list alive green -&gt; request has terminated grey -&gt; the search target hash order position(s) (more targets if a dht partition is used)< "Search event picture" <html lang="en"> Performance Settings Memory Settings Memory reserved for <abbr title="Java Virtual Machine">JVM</abbr> MByte "Set" Resource Observer Memory state >proper< >exhausted< Reset state Manually reset to 'proper' state Enough memory is available for proper operation. Within the last eleven minutes, at least four operations have tried to request memory that would have reduced free space within the minimum required. Minimum required Amount of memory (in Mebibytes) that should at least be free for proper operation Disable <abbr title="Distributed Hash Table">DHT</abbr>-in below. Free space disk Steady-state minimum Amount of space (in Mebibytes) that should be kept free as steady state <abbr title="Mebibyte">MiB</abbr> Disable crawls when free space is below. Absolute minimum Amount of space (in Mebibytes) that should at least be kept free as hard limit Disable <abbr title="Distributed Hash Table">DHT</abbr>-in when free space is below. >Autoregulate< when absolute minimum limit has been reached. The autoregulation task performs the following sequence of operations, stopping once free space disk is over the steady-state value delete old releases delete logs delete robots.txt table delete news clear HTCACHE clear citations throw away large crawl queues cut away too large RWIs Used space disk Steady-state maximum Maximum amount of space (in Mebibytes) that should be used as steady state Disable crawls when used space is over. Absolute maximum Maximum amount of space (in Mebibytes) that should be used as hard limit Disable <abbr title="Distributed Hash Table">DHT</abbr>-in when used space is over. when absolute maximum limit has been reached. The autoregulation task performs the following sequence of operations, stopping once used space disk is below the steady-state value > free space disable <abbr title="Distributed Hash Table">DHT</abbr>-in below <abbr title="Random Access Memory">RAM</abbr> Accepted change. This will take effect after <strong>restart</strong> of YaCy restart now</a> Confirm Restart refresh graph Save Changes take effect immediately Online Caution Settings: This is the time that the crawler idles when the proxy is accessed, or a local or remote search is done. The delay is extended by this time each time the proxy is accessed afterwards. This shall improve performance of the affected process (proxy or search). (current delta is seconds since last proxy/local-search/remote-search access.) Online Caution Case indexer delay (milliseconds) after case occurency Local Search: Remote Search: "Enter New Parameters" Online Caution Settings Indexing with Proxy YaCy can be used to 'scrape' content from pages that pass the integrated caching HTTP proxy. When scraping proxy pages then <strong>no personal or protected page is indexed</strong>; those pages are detected by properties in the HTTP header (like Cookie-Use, or HTTP Authorization) or by POST-Parameters (either in URL or as HTTP protocol) and automatically excluded from indexing. You have to >setup the proxy< before use. Proxy Auto Config: this controls the proxy auto configuration script for browsers at http://localhost:8090/autoconfig.pac .yacy-domains only whether the proxy should only be used for .yacy-Domains Proxy pre-fetch setting: this is an automated html page loading procedure that takes actual proxy-requested Prefetch Depth A prefetch of 0 means no prefetch; a prefetch of 1 means to prefetch all embedded URLs, but since embedded image links are loaded by the browser this means that only embedded href-anchors are prefetched additionally. Store to Cache It is almost always recommended to set this on. The only exception is that you have another caching proxy running as secondary proxy and YaCy is configured to used that proxy in proxy-proxy - mode. Do Local Text-Indexing If this is on, all pages (except private content) that passes the proxy is indexed. Do Local Media-Indexing This is the same as for Local Text-Indexing, but switches only the indexing of media content on. Do Remote Indexing If checked, the crawler will contact other peers and use them as remote indexers for your crawl. If you need your crawling results locally, you should switch this off. Only senior and principal peers can initiate or receive remote crawls. Please note that this setting only take effect for a prefetch depth greater than 0. Proxy generally Path The path where the pages are stored (max. length 300) Size</label> The size in MB of the cache. "Set proxy profile" The file DATA/PLASMADB/crawlProfiles0.db is missing or corrupted. Please delete that file and restart. Pre-fetch is now set to depth Caching is now #(caching)#off::on#(/caching)#. Local Text Indexing is now #(indexingLocalText)#off::on Local Media Indexing is now #(indexingLocalMedia)#off::on Remote Indexing is now #(indexingRemote)#off::on Cachepath is now set to '#[return]#'.</strong> Please move the old data in the new directory. Cachesize is now set to #[return]#MB. Changes will take effect after restart only. An error has occurred: You can see a snapshot of recently indexed pages on the URLs as crawling start points for crawling. Page. Quickly adding Bookmarks: Crawl with YaCy Title: Link: Status: URL successfully added to Crawler Queue Malformed URL Unable to create new crawling profile for URL: Unable to add URL to crawler queue: Quick Crawl Link Simply drag and drop the link shown below to your Browsers Toolbar/Link-Bar. If you click on it while browsing, the currently viewed website will be inserted into the YaCy crawling queue for indexing. RWI Ranking Configuration< The document ranking influences the order of the search result entities. A ranking is computed using a number of attributes from the documents that match with the search word. The attributes are first normalized over all search results and then the normalized attribute is multiplied with the ranking coefficient computed from this list. The ranking coefficient grows exponentially with the ranking levels given in the following table. If you increase a single value by one, then the strength of the parameter doubles. There are two ranking stages: first all results are ranked using the pre-ranking and from the resulting list the documents are ranked again with a post-ranking. The two stages are separated because they need statistical information from the result of the pre-ranking. Pre-Ranking >Post-Ranking< "Set as Default Ranking" "Re-Set to Built-In Ranking" Solr Ranking Configuration< These are ranking attributes for Solr. This ranking applies for internal and remote (P2P or shard) Solr access. Select a profile: >Boost Function< To see all available fields, see the >YaCy Solr Schema< and look for numeric values (these are names with suffix '_i'). To find out which kind of operations are possible, see the >Solr Function Query< documentation. Example: to order by date, use "Set Boost Function" "Re-Set to default" You can boost with vocabularies, use the occurrence counters >Filter Query< The Filter Query is attached to every query. Use this to statically add a selection criteria to reduce the set of results. Example: "http_unique_b:true AND www_unique_b:true" will filter out all results where urls appear also with/without http(s) and/or with/without 'www.' prefix. To find appropriate fields for this query, see the YaCy Solr Schema Warning: bad expressions here will cause that you don't have any search result! "Set Filter Query" >Boost Query< Example: "fuzzy To find appropriate fields for this query, see the and look for boolean values (with suffix '_b') or tags inside string fields (with suffix '_s' or '_sxt'). "Set Boost Query" field not in local index (boost has no effect) You can boost with vocabularies, use the field with values You can also boost on logarithmic occurrence counters of the fields "Set Field Boosts" A Boost Function can combine numeric values from the result document to produce a number which is multiplied with the score value from the query result. The Boost Query is attached to every query. Use this to statically boost specific content in the index. means that documents, identified as 'double' are ranked very bad and appended to the end of all results (because the unique are ranked high). This is the set of searchable fields (see Entries without a boost value are not searched. Boost values make hits inside the corresponding field more important. Regex Test Test String Regular Expression This is a Java Pattern Result< no match< > match< error in expression: Remote Crawl Configuration >Remote Crawler< The remote crawler is a process that requests urls from other peers. Peers offer remote-crawl urls if the flag 'Do Remote Indexing' is switched on when a crawl is started. Remote Crawler Configuration Your peer cannot accept remote crawls because you need senior or principal peer status for that! >Accept Remote Crawl Requests< Perform web indexing upon request of another peer. Load with a maximum of pages per minute "Save" Crawl results will appear in the >Crawl Result Monitor< Peers offering remote crawl URLs If the remote crawl option is switched on, then this peer will load URLs from the following remote peers: >Name< URLs for<br/>Remote<br/>Crawl >Release< >PPM< >QPH< >Last<br/>Seen< >UTC</strong><br/>Offset< >Uptime< >Links< >Age< >Protocol< >IP< >URL< >Access< >Process< >empty< >granted< >denied< >not in index< >indexed< "Add Selected Servers to Crawler" The following servers can be searched: Available server within the given IP range >inaccessible< YaCy '#[clientname]#': Settings Acknowledge Settings Receipt: No information has been submitted Error with submitted information. Nothing changed.</p> The user name must be given. Your request cannot be processed. The password redundancy check failed. You have probably misstyped your password. Shutting down.</strong><br />Application will terminate after working off all crawling tasks. Your administration account setting has been made. Your new administration account name is #[user]#. The password has been accepted.<br />If you go back to the Settings page, you must log-in again. Your proxy access setting has been changed. Your proxy account check has been disabled. The new proxy IP filter is set to The proxy port is: Port rebinding will be done in a few seconds. You can reach your YaCy server under the new location Your server access filter is now set to Auto pop-up of the Status page is now <strong>disabled</strong> Auto pop-up of the Status page is now <strong>enabled</strong> You are now permanently <strong>online</strong>. After a short while you should see the effect on the status</a> page. The Peer Name is: Your static Ip(or DynDns) is: Seed Settings changed.#(success)#::You are now a principal peer. Seed Settings changed, but something is wrong. Seed Uploading was deactivated automatically. Please return to the settings page and modify the data. The remote-proxy setting has been changed If you open any public web page through the proxy, you must log-in. The new setting is effective immediately, you don't need to re-start. The submitted peer name is already used by another peer. Please choose a different name.</strong> The Peer name has not been changed. Your Peer Language is: Seed Upload method was changed successfully. You are now a principal peer. Seed Upload Method: Seed File URL: Your proxy networking settings have been changed. Transparent Proxy Support is: Your message forwarding settings have been changed. Message Forwarding Support is: Message Forwarding Command: Recipient Address: You are now <strong>event-based online</strong>. You are now in <strong>Cache Mode</strong>. Only Proxy-cache ist available in this mode. You can now go back to the Settings</a> page if you want to make more changes. Send via header is: Send X-Forwarded-For header is: Your crawler settings have been changed. Generic Settings: Crawler timeout: http Crawler Settings: Maximum HTTP Filesize: ftp Crawler Settings: Maximum SMB Filesize: Maximum file Filesize: Maximum FTP Filesize: smb Crawler Settings: Your need to restart YaCy to activate the changes. URL Proxy settings have been saved. >Crawler Settings< Generic Crawler Settings Connection timeout in ms means unlimited HTTP Crawler Settings: Maximum Filesize FTP Crawler Settings SMB Crawler Settings Local File Crawler Settings Maximum allowed file size in bytes that should be downloaded Larger files will be skipped Please note that if the crawler uses content compression, this limit is used to check the compressed content size Submit Changes will take effect immediately Timeout: Message Forwarding With this settings you can activate or deactivate forwarding of yacy-messages via email. Enable message forwarding Enabling/Disabling message forwarding via email. Forwarding Command The command-line program that should be used to forward the message.<br /> Forwarding To The recipient email-address.<br /> e.g.: "Submit" Changes will take effect immediately. Remote Proxy (optional) YaCy can use another proxy to connect to the internet. You can enter the address for the remote proxy here: Use remote proxy</label> Enables the usage of the remote proxy by yacy Use remote proxy for HTTPS Specifies if YaCy should forward ssl connections to the remote proxy. Remote proxy host The ip address or domain name of the remote proxy Remote proxy port Remote proxy user Remote proxy password No-proxy adresses IP addresses for which the remote proxy should not be used "Submit" Changes will take effect immediately. the port of the remote proxy Proxy Settings Transparent Proxy With this you can specify if YaCy can be used as transparent proxy. Hint: On linux you can configure your firewall to transparently redirect all http traffic through yacy using this iptables rule Always Fresh If unchecked, the proxy will act using Cache Fresh / Cache Stale rules. If checked, the cache is always fresh which means that a page is never loaded again if it was already stored in the cache. However, if the page does not exist in the cache, it will be loaded in any case. Send "Via" Header Specifies if the proxy should send the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.45" target="_blank">Via</a> http header according to RFC 2616 Sect 14.45. Send "X-Forwarded-For" Header Specifies if the proxy should send the X-Forwarded-For http header. "Submit" HTTP Server Port HTTPS Server Port "change" Proxy Access Settings These settings configure the access method to your own http proxy and server. All traffic is routed throug one single port, for both proxy and server. Server Access Restrictions You can restrict the access to this proxy/server using a two-stage security barrier: define an <em>access domain</em> with a list of granted client IP-numbers or with wildcards define an <em>user account</em> with an user:password - pair This is the account that restricts access to the proxy function. You probably don't want to share the proxy to the internet, so you should set the IP-Number Access Domain to a pattern that corresponds to you local intranet. The default setting should be right in most cases. If you want, you can also set a proxy account so that every proxy user must authenticate first, but this is rather unusual. IP-Number filter Use <a Seed Upload Settings With these settings you can configure if you have an account on a public accessible server where you can host a seed-list file. General Settings: If you enable one of the available uploading methods, you will become a principal peer. Your peer will then upload the seed-bootstrap information periodically, but only if there have been changes to the seed-list. Upload Method "Submit" >URL< Retry Uploading Here you can specify which upload method should be used. Select 'none' to deactivate uploading. The URL that can be used to retrieve the uploaded seed file, like Store into filesystem: You must configure this if you want to store the seed-list file onto the file system. File Location Here you can specify the path within the filesystem where the seed-list file should be stored. "Submit" Uploading via FTP: This is the account for a FTP server where you can host a seed-list file. If you set this, you will become a principal peer. Your peer will then upload the seed-bootstrap information periodically, but only if there had been changes to the seed-list. The host where you have a FTP account, like Path</label> The remote path on the FTP server, like Missing sub-directories are NOT created automatically. Username >Server< Your log-in at the FTP server Password</label> The password "Submit" Uploading via SCP: This is the account for a server where you are able to login via ssh. >Server< The host where you have an account, like 'my.host.net' Server&nbsp;Port The sshd port of the host, like '22' Path</label> The remote path on the server, like '~/yacy/seed.txt'. Missing sub-directories are NOT created automatically. Username Your log-in at the server Password</label> The password "Submit" Server Access Settings IP-Number filter: requires restart Here you can restrict access to the server. By default, the access is not limited, because this function is needed to spawn the p2p index-sharing function. If you block access to your server (setting anything else than '*'), then you will also be blocked from using other peers' indexes for search service. However, blocking access may be correct in enterprise environments where you only want to index your company's own web pages. Filter have to be entered as IP, IP range or first part of allowed IP's separated by comma (e.g. 10.100.0-100.0-100, 127. ) further details on format see Jetty fileHost: staticIP (optional): <strong>The staticIP can help that your peer can be reached by other peers in case that your peer is behind a firewall or proxy.</strong> You can create a tunnel through the firewall/proxy (look out for 'tunneling through https proxy with connect command') and create an access point for incoming connections. This access address can be set here (either as IP number or domain name). If the address of outgoing connections is equal to the address of incoming connections, you don't need to set anything here, please leave it blank. ATTENTION: Your current IP is recognized as "#[clientIP]#". If the value you enter here does not match with this IP, you will not be able to access the server pages anymore. value="Submit" Server Port Settings Server port: This is the main port for all http communication (default is 8090). A change requires a restart. Server ssl port: This is the port to connect via https (default is 8443). Shutdown port: This is the local port on the loopback address (127.0.0.1 or :1) to listen for a shutdown signal to stop the YaCy server (-1 disables the shutdown port, recommended default is 8005). Advanced Settings If you want to restore all settings to the default values, but <strong>forgot your administration password</strong>, you must stop the proxy, delete the file 'DATA/SETTINGS/yacy.conf' in the YaCy application root folder and start YaCy again. Server Access Settings Proxy Access Settings Crawler Settings Remote Proxy (optional) Seed Upload Settings Message Forwarding (optional) Console Status Log-in as administrator to see full status Welcome to YaCy! Your settings are _not_ protected! Please open the <a href="ConfigAccounts_p.html">accounts configuration</a> page <strong>immediately</strong> and set an administration password. Access is unrestricted from localhost (this includes administration features). Please check the <a href="ConfigAccounts_p.html">accounts configuration</a> page to ensure that the settings match the security level you need. You have not published your peer seed yet. This happens automatically, just wait. The peer must go online to get a peer address. You cannot be reached from outside. A possible reason is that you are behind a firewall, NAT or Router. But you can <a href="index.html">search the internet</a> using the other peers' global index on your own search page. "bad" "idea" "good" "Follow YaCy on Twitter" We encourage you to open your firewall for the port you configured (usually: 8090), or to set up a 'virtual server' in your router settings (often called DMZ). Please be fair, contribute your own index to the global index. Free disk space is lower than #[minSpace]#. Crawling has been disabled. Please fix it as soon as possible and restart YaCy. Free memory is lower than #[minSpace]#. DHT-in has been disabled. Please fix Crawling is paused! If the crawling was paused automatically, please check your disk space. Latest public version is You can download a more recent version of YaCy. Click here to install this update and restart YaCy: Install YaCy You can download the latest releases here: You are running a server in senior mode and you support the global internet index, which you can also <a href="index.html">search yourself</a>. You have a principal peer because you publish your seed-list to a public accessible server where it can be retrieved using the URL Your Web Page Indexer is idle. You can start your own web crawl <a href="CrawlStartSite.html">here</a> Your Web Page Indexer is busy. You can <a href="Crawler_p.html">monitor your web crawl</a> here. If you need professional support, please write to For community support, please visit our >forum< System Status System YaCy version Unknown Uptime: Processors: Load: Threads: peak: total: Protection Password is missing password-protected Unrestricted access from localhost Address</dt> peer address not assigned Host: Public Address: YaCy Address: Proxy</dt> Transparent not used broken::connected broken connected Used for YaCy -> YaCy communication: WARNING: You do this on your own risk. If you do this without YaCy running on a desktop-pc, this will possibly break startup. In this case, you will have to edit the configuration manually in DATA/SETTINGS/yacy.conf Remote: Tray-Icon Experimental< Yes No Auto-popup on start-up Disabled Enable] Enabled Disable] Memory Usage RAM used: RAM max: DISK used: (approx.) DISK free: on::off Configure max: Traffic >Reset Proxy: Crawler: Incoming Connections Active: Max: Loader Queue paused >Queues< Local Crawl Remote triggered Crawl Pre-Queueing Seed server Enabled: Updating to server Last upload: #[lastUpload]# ago. Enabled: Updating to file YaCy version: Java version: >Experimental< Enabled <a Reset</a> Steering</title> Checking peer status... Peer is online again, forwarding to status page... Peer is not online yet, will check again in a few seconds... No action submitted Go back to the <a href="Settings_p.html">Settings</a> page Your system is not protected by a password Please go to the <a href="ConfigAccounts_p.html">User Administration</a> page and set an administration password. You don't have the correct access right to perform this task. Please log in. You can now go back to the <a href="Settings_p.html">Settings</a> page if you want to make more changes. See you soon! Just a moment, please! Application will terminate after working off all scheduled tasks. Please send us feed-back! We don't track YaCy users, YaCy does not send 'home-pings', we do not even know how many people use YaCy as their private search engine. Therefore we like to ask you: do you like YaCy? Will you use it again... if not, why? Is it possible that we change a bit to suit your needs? Please send us feed-back about your experience with an >anonymous message< or a< posting to our web forums >bug report< >Professional Support< If you are a professional user and you would like to use YaCy in your company in combination with consulting services by YaCy specialists, please see Then YaCy will restart. If you can't reach YaCy's interface after 5 minutes restart failed. Installing release YaCy will be restarted after installation Supporter< Please enter a comment to your link recommendation. Your Vote is also considered without a comment. Supporter are switched off for users without authorization "bookmark" "Add to bookmarks" "positive vote" "Give positive vote" "negative vote" "Give negative vote" provided by YaCy peers with an URL in their profile. This shows only URLs from peers that are currently online. Surftips</title> Surftips</h2> Surftips are switched off title="bookmark" alt="Add to bookmarks" title="positive vote" alt="Give positive vote" title="negative vote" alt="Give negative vote" YaCy Supporters< >a list of home pages of yacy users< provided by YaCy peers using public bookmarks, link votes and crawl start points "Please enter a comment to your link recommendation. (Your Vote is also considered without a comment.)" Hide surftips for users without autorization Show surftips to everyone : Peer Steering The information that is presented on this page can also be retrieved as XML. Click the API icon to see the XML. To see a list of all APIs, please visit the API wiki page >Process Scheduler< This table shows actions that had been issued on the YaCy interface to change the configuration or to request crawl actions. These recorded actions can be used to repeat specific actions and to send them to a scheduler for a periodic execution. >Recorded Actions< "next page" "previous page" of #[of]# >Type >Comment Call Count< Recording&nbsp;Date Last&nbsp;Exec&nbsp;Date Next&nbsp;Exec&nbsp;Date >Event Trigger< "clone" >Scheduler< >no event< >activate event< >no repetition< >activate scheduler< >off< >run once< >run regular< >after start-up< at 00:00h at 01:00h at 02:00h at 03:00h at 04:00h at 05:00h at 06:00h at 07:00h at 08:00h at 09:00h at 10:00h at 11:00h at 12:00h at 13:00h at 14:00h at 15:00h at 16:00h at 17:00h at 18:00h at 19:00h at 20:00h at 21:00h at 22:00h at 23:00h "Execute Selected Actions" "Delete Selected Actions" "Delete all Actions which had been created before " day< days< week< weeks< month< months< year< years< >Result of API execution >minutes< >hours< Scheduled actions are executed after the next execution date has arrived within a time frame of #[tfminutes]# minutes. To see a list of all APIs, please visit the Table Viewer The information that is presented on this page can also be retrieved as XML. Click the API icon to see the XML. To see a list of all APIs, please visit the API wiki page >robots.txt table< Table Viewer Table Editor: showing table "Edit Selected Row" "Add a new Row" "Delete Selected Rows" "Delete Table" "Rebuild Index" Primary Key >Row Editor< "Commit" Table Selection Select Table: show max. entries >all< Display columns: "load" Search/Filter Table search rows for "Search" >select a tag< >Folders< >select a folder< >Import Bookmarks< "import" YMark Table Administration Table Viewer Table Administration Table Selection Select Table: show max. >all< entries search rows for "Search" Table Editor: showing table "Edit Selected Row" "Add a new Row" "Delete Selected Rows" "Delete Table" Row Editor Primary Key "Commit" entries, YaCy Debugging: Thread Dump Threaddump< "Single Threaddump" "Multiple Dump Statistic" Translation News for Language Translation News You can share your local addition to translations and distribute it to other peers. The remote peer can vote on your translation and add it to the own local translation. entries available "Publish" You can check your outgoing messages >here< To edit or add local translations you can use File: >Originator< English: >existing< Translation: >score negative vote positive vote Vote on this translation. If you vote positive the translation is added to your local translation list. Translation Editor Translate untranslated text of the user interface (current language). The modified translation file is stored in DATA/LOCALE directory. UI Translation Target Language: activate a different language Source File view it filter untranslated Source Text Translated Text Save translation Check for remote translation proposals and/or share your own added translations User Page You are not logged in.<br /> Username: Password: <input "login" You are currently logged in as #[username]#. You have used old Password new Password< new Password(repetition) "Change" You are currently logged in as admin. value="logout" (after logout you will be prompted for your password again. simply click "cancel") Password was changed. Old Password is wrong. New Password and its repetition do not match. New Password is empty. minutes of your onlinetime limit of minutes per day. See the page info about the url. View URL Content >Get URL Viewer< "Show Metadata" "Browse Host" >URL Metadata< Search in Document: "Show Snippet" Hash (click this for full metadata) In Metadata: In Cache: Word Count Description Size MimeType: Collections View as Plain Text Parsed Text Parsed Sentences Parsed Tokens/Words Link List Citation Report "Show" Unable to find URL Entry in DB Invalid URL Unable to download resource content. Unable to parse resource content. Unsupported protocol. >Original Content from Web< Parsed Content >Original from Web< >Original from Cache< >Parsed Tokens< Server Log Lines reversed order "refresh" Local Peer Profile: Remote Peer Profile Wrong access of this page The requested peer is unknown or a potential peer. The profile can't be fetched. The peer is not online. This is the Profile of >Name Nick Name Homepage eMail Comment View this profile as > or You can edit your profile <a href="ConfigProfile_p.html">here</a> <html lang="en"> YaCy '#[clientname]#': Federated Index The information that is presented on this page can also be retrieved as XML Click the API icon to see the RDF Ontology definition for this vocabulary. To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>. Vocabulary Administration Vocabularies can be used to produce a search navigation. A vocabulary must be created before content is indexed. The vocabulary is used to annotate the indexed content with a reference to the object that is denoted by the term of the vocabulary. The object can be denoted by a url stub that, combined with the term, becomes the url for the object. Vocabulary Selection Vocabulary Name "View" Vocabulary Production Empty Vocabulary Auto-Discover Import from a csv file File Path Column for Literals no Synonyms Auto-Enrich with Synonyms from Stemming Library Read Column first has index if unused set Column for Object Link (optional) Charset of Import File It is possible to produce a vocabulary out of the existing search index. This is done using a given 'objectspace' which you can enter as a URL Stub. This stub is used to find all matching URLs. If the remaining path from the matching URLs then denotes a single file, the file name is used as vocabulary term. This works best with wikis. Try to use a wiki url as objectspace path. Objectspace from file name from page title&nbsp; from page title (splitted) from page author "Create" Vocabulary Editor >Modify< >Delete< >Literal< >Synonyms< >Object Link< >add< clear table (remove all terms) delete vocabulary< "Submit" Web Structure The data that is visualized here can also be retrieved in a XML file, which lists the reference relation between the domains. With a GET-property 'about' you get only reference relations about the host that you give in the argument field for 'about'. With a GET-property 'latest' you get a list of references that had been computed during the current run-time of YaCy, and with each next call only an update to the next list of references. Click the API icon to see the XML file. To see a list of all APIs, please visit the API wiki page >Host List< >#[count]# outlinks host< depth< nodes< time< size< >Background< >Text< >Line< >Pivot Dot< >Other Dot< >Dot-end< >Color < "change" "WebStructurePicture" YaCyWiki page: last edited by change date Edit< only granted to admin Grant Write Access to Start Page Index Versions Author: You can use Wiki Code</a> here. "edit" "Submit" "Preview" "Discard" >Preview No changes have been submitted so far! Subject Change Date Last Author IO Error reading wiki database: Select versions of page Compare version from "Show" with version from "current" "Compare" Return to Changes will be published as announcement on YaCyNews Wiki Help Wiki-Code This table contains a short description of the tags that can be used in the Wiki and several other servlets of YaCy. For a more detailed description visit the Code Description These tags create headlines. If a page has three or more headlines, a table of content will be created automatically. Headlines of level 1 will be ignored in the table of content. These tags create stressed texts. The first pair emphasizes the text (most browsers will display it in italics), the second one emphazises it more strongly (i.e. bold) and the last tags create a combination of both. Text will be displayed <span class="strike">stricken through</span>. Text will be displayed <span class="underline">underlined</span>. Lines will be indented. This tag is supposed to mark citations, but may as well be used for styling purposes. These tags create a numbered list. These tags create an unnumbered list. These tags create a definition list. This tag creates a horizontal line. This tag creates links to other pages of the wiki. This tag displays an image, it can be aligned left, right or center. This tag displays a Youtube or Vimeo video with the id specified and fixed width 425 pixels and height 350 pixels. i.e. use to embed this video: These tags create a table, whereas the first marks the beginning of the table, the second starts a new line, the third and fourth each create a new cell in the line. The last displayed tag closes the table. A text between these tags will keep all the spaces and linebreaks in it. Great for ASCII-art and program code. If a line starts with a space, it will be displayed in a non-proportional font. This tag creates links to external websites. =headline point something< another thing and yet another something else word :definition pagename description]] url description alt text Login TreeView Import Bookmarks Bookmarks (XBEL) YaCy Bookmarks The information that is presented on this page can also be retrieved as XML. Click the API icon to see the XML. To see a list of all APIs, please visit the <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:API" target="_blank">API wiki page</a>. Bookmarks (user: #[user]# size: #[size]#) Explorer Tag Manager >Import< Export All tag actions are applied to the sub-set of bookmarks defined by this query. >Query< Query Type Tags (comma seperated) Tags (regexp) Folders (comma seperated) Folders (regexp) Title (regexp) Description (regexp) Enter tags to add (<i>replace with</i>) (comma separated tags) "Replace" Bookmark Importer If you put in your bookmarks here, you can access them anywhere where you have access to your YaCy peer. Think of it as your 'personal cloud' for bookmarking. Surrogate XML YaCy White/Black List YaCy Crawl Starts (admin) Bookmark file Folder settings A folder structure is helpful to organize your bookmarks in a hierarchical way. Source folder Target folder Automatic tagging Tags are words that are attached to documents as metadata. It is possible to read all the documents and find the attached tags automatically. Off Only for empty tags Overwriting existing tags Merging with existing tags Automatic Indexing While doing the bookmark import, YaCy can push all URLs to the indexing process No indexing >Index every bookmark entry< Index every bookmark entry plus all directly linked pages Index all domains from all bookmarks completely include all media (image/movie/document) links Add & Edit Bookmark Public: yes no Title: Description: Folder (/folder/subfolder): Tags (comma separated): Craw Start >Bookmark< Bookmarks Restrict to start domain Restrict to sub-path of given url Crawling Depth bookmark only (0) shallow crawl (4) deep crawl (8) deeper crawl (16) indefinite (99) Limitations not more than documents Dynamic URLs allow <a href="http://en.wikipedia.org/wiki/Query_string" target="_blank"> query-strings</a> (urls with a '?' in the path) Scheduler run this crawl once scheduled, look every minutes hours days for new documents automatically. No filter multiple "Import" Document Citations for List of other web pages with citations Similar documents from different hosts: Table Viewer "Edit Table" >Author< >Description< >Subject< >Date< >Type< >Identifier< >Language< >Load Date< >Referrer Identifier< >Document size< >Number of Words< >Title< Websearch Comparison Left Search Engine Right Search Engine "Compare" Search Result &nbsp;Administration Toggle navigation Re-Start< Shutdown< Download YaCy Community (Web Forums) Project Wiki Search Interface About This Page Portal Configuration Portal Design Ranking and Heuristics Crawler Monitor Index Administration Filter &amp; Blacklists Content Semantic Target Analysis Process Scheduler Monitoring Index Browser Network Access >Terminal Confirm Re-Start Confirm Shutdown Project Wiki< Git Repository Bugtracker "Search..." "You just started a YaCy peer!" "As a first-time-user you see only basic functions. Set a use case or name your peer to see more options. Start a first web crawl to see all monitoring options." "You did not yet start a web crawl!" "You do not see all monitoring options here, because some belong to crawl result monitoring. Start a web crawl to see that!" First Steps Use Case &amp; Account Load Web Pages, Crawler RAM/Disk Usage &amp; Updates System Status Peer-to-Peer Network Advanced Crawler Index Export/Import System Administration Configuration Production >Administration< Search Portal Integration You just started a YaCy peer! As a first-time-user you see only basic functions. Set a use case or name your peer to see more options. Start a first web crawl to see all monitoring options. You did not yet start a web crawl! You do not see all monitoring options here, because some belong to crawl result monitoring. Start a web crawl to see that! Design English, Englisch Toggle navigation Search Interfaces Administration &raquo; >Web Search< >File Search< >Compare Search< >Index Browser< >URL Viewer< Example Calls to the Search API: Solr Default Core Solr Webgraph Core Google Appliance API Download YaCy Community (Web Forums) Project Wiki Search Interface About This Page Bugtracker Git Repository Access Tracker Server Access Access Grid Incoming Requests Overview Incoming Requests Details All Connections< Local Search< Log Host Tracker Remote Search< Cookie Menu Incoming&nbsp;Cookies Outgoing&nbsp;Cookies Filter &amp; Blacklists Blacklist Administration Blacklist Cleaner Blacklist Test Import/Export Content Control >Application Status< >Status< System Thread Dump >Processes< >Server Log< >Concurrent Indexing< >Memory Usage< >Search Sequence< >Messages< >Overview< >Incoming&nbsp;News< >Processed&nbsp;News< >Outgoing&nbsp;News< >Published&nbsp;News< >Community Data< >Surftips< >Local Peer Wiki< UI Translations System Administration Advanced Settings Advanced Properties Viewer and administration for database tables Performance Settings of Busy Queues >Performance Overview</a> Receipts</a> Queries</a> DHT Transfer Proxy Use Local Crawling</a> Global Crawling</a> Surrogate Import Crawl Results Processing Monitor Crawler< Loader< Rejected URLs >Queues< Local< Global Remote No-Load Crawler Steering Scheduler and Profile Editor< robots.txt Monitor Load Web Pages Site Crawling Parser Configuration >Appearance< >Language< Search Page Layout Design >Appearance >Language Index Administration URL Database Administration Index Deletion Index Sources &amp; Targets Solr Schema Editor Field Re-Indexing Reverse Word Index Content Analysis Crawler/Spider< Crawl Start (Expert) Network Scanner Crawling of MediaWikis >Crawling of phpBB3 Forums< Network Harvesting< Remote Crawling Scraping Proxy Advanced Crawler Crawling of phpBB3 Forums >Database Reader< RSS Feed Importer OAI-PMH Importer Database Reader for phpBB3 Forums Dump Reader for MediaWiki dumps RAM/Disk Usage &amp; Updates >Performance< Web Cache Download System Update Search Box Anywhere Generic Search Portal User Profile Local robots.txt Portal Configuration Publication File Hosting Solr Ranking Config RWI Ranking Config >Heuristics< Ranking and Heuristics Content Semantic >Automated Annotation< Auto-Annotation Vocabulary Editor Knowledge Loader Target Analysis Mass Crawl Check Regex Test Use Case &amp; Accounts Basic Configuration >Accounts< Network Configuration Web Visualization Web Structure Image Collage Index Browser <html lang="en"> YaCy '#[clientname]#': Search Page >Search< Text Images Audio Video Applications more options... Results per page Resource global restrict on show all Prefer mask Constraints only index pages the peer-to-peer network only the local index Query Operators restrictions only urls with the &lt;phrase&gt; in the url only urls with the &lt;phrase&gt; within outbound links of the document only urls with extension only urls from host only pages with as-author-anotated only pages from top-level-domains only pages with a date between &lt;date1&gt; and &lt;date2&gt; in content only pages with &lt;date&gt; in content only resources from http or https servers only resources from ftp servers they are rare crawl them yourself only resources from smb servers Intranet Indexing</a> must be selected only files from a local file system spatial restrictions only documents having location metadata (geographical coordinates) only documents within a square zone embracing a circle of given radius (in decimal degrees) around the specified latitude and longitude (in decimal degrees) >ranking modifier< sort by date latest first multiple words shall appear near doublequotes prefer given language an <a href="http://www.loc.gov/standards/iso639-2/php/English_list.php" title="Reference alpha-2 language codes list">ISO 639-1</a> 2-letter code heuristics add search results from Search Navigation keyboard shortcuts <a href="https://en.wikipedia.org/wiki/Access_key">Access key</a> modifier + n next result page <a href="https://en.wikipedia.org/wiki/Access_key">Access key</a> modifier + p previous result page automatic result retrieval browser integration after searching, click-open on the default search engine in the upper right search field of your browser and select 'Add "YaCy Search.."' search as rss feed click on the red icon in the upper right after a search. this works good in combination with the '/date' ranking modifier. See an >example json search results for ajax developers: get the search rss feed and replace the '.rss' extension in the search result url with '.json' ranking modifier add search results from external opensearch systems click on the red icon in the upper right after a search. this works good in combination with the "Continue this queue" "Pause this queue" >Size >Date Your Username/Password is wrong. Username</label> Password</label> "login" YaCy: Error Message request: unspecified error not-yet-assigned error You don't have an active internet connection. Please go online. Could not load resource. The file is not available. Exception occurred Generated #[date]# by Your Account is disabled for surfing. Your Timelimit (#[timelimit]# Minutes per Day) is reached. The server could not be found. Did you mean: Shared Blacklist Add Items to Blacklist Unable to store the items into the blacklist file: YaCy-Peer &quot;<span class="settingsValue">#[name]#</span>&quot; not found. not found or empty list. Wrong Invocation! Please invoke with Blacklist source: Blacklist target: Blacklist item "select all" "deselect all" value="add" YaCy System Terminal Monitor YaCy Peer Live Monitoring Terminal Search Form Crawl Start Status Page Confirm Shutdown >&lt;Shutdown Event Terminal Image Terminal Domain Monitor "Loading Processing software..." This browser does not have a Java Plug-in. Get the latest Java Plug-in here. Resource Monitor Network Monitor About YaCy-UI Admin Console "Bookmarks" >Bookmarks Server Log 'Displaying {from} to {to} of {total} items' 'Processing, please wait ...' 'No items' Loading&#8230; Loading&#8230; YaCy P2P Websearch "Search" >Text >Images >Audio >Video >Applications Search term: "help" Resource/Network: freeworld local peer >bookmarks sciencenet >Language: any language Bookmark Folders Bookmark Tags< Search Options Constraint: all pages index pages URL mask: Prefer mask: Bookmark TagCloud Topwords< alt="help" title="help" Peer Control "Login" Themes Messages Re-Start Shutdown Web Indexing Crawl Start Monitoring YaCy Network >Settings "Basic Settings" Basic Accounts "Network" Network "Advanced Settings" Advanced "Update Settings" Update >YaCy Project "YaCy Project Home" Project "YaCy Forum" "Help" 'Add' 'Crawl' 'Edit' 'Delete' 'Rename' 'Help' "YaCy Bookmarks" 'Public' 'Title' 'Tags' 'Folders' 'Date' >Overview YaCy-UI is going to be a JavaScript based client for YaCy based on the existing XML and JSON API. YaCy-UI is at most alpha status, as there is still problems with retriving the search results. I am currently changing the backend to a more application friendly format and getting good results with it (I will check that in some time after the stable release 0.7). For now have a look at the bookmarks, performance has increased significantly, due to the use of JSON and Flexigrid! YaCy Interactive Search This search result can also be retrieved as RSS/<a href="http://www.opensearch.org" target="_blank">opensearch</a> output. The query format is similar to SRU Click the API icon to see an example call to the search rss API. To see a list of all APIs, please visit the API wiki page loading from local index... e="Search" "Search..." Search Page This search result can also be retrieved as RSS/<a href="http://www.opensearch.org" target="_blank">opensearch</a> output. "search" "search again" Illegal URL mask: (not a valid regular expression), mask ignored. Illegal prefer mask: Did you mean: The following words are stop-words and had been excluded from the search: No Results. length of search words must be at least 1 character Searching the web with this peer is disabled for unauthorized users. Please >log in< as administrator to use the search function Location -- click on map to enlarge Map (c) by < and contributors, CC-BY-SA >Media< > of > local, remote from YaCy peers). >search< "bookmark" "recommend" "delete" Pictures show search results for "#[query]#" on map >Provider >Name Space >Author >Filetype >Language >Peer-to-Peer< Stealth Mode Privacy Context Ranking Sort by Date Documents Images Your search is done using peers in the YaCy P2P network. You can switch to 'Stealth Mode' which will switch off P2P, giving you full privacy. Expect less results then, because then only your own search index is used. Your search is done using only your own peer, locally. You can switch to 'Peer-to-Peer Mode' which will cause that your search is done using the other peers in the YaCy network. >Documents >Images