yacy_search_server/README.md

# YaCy
[![Gitter](https://badges.gitter.im/yacy/yacy_search_server.svg)](https://gitter.im/yacy/yacy_search_server?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/yacy/yacy_search_server.svg?branch=master)](https://travis-ci.org/yacy/yacy_search_server)

[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy)

## What is this?

YaCy is a search engine software. It takes a new approach to search
because it does not use a central server. Instead, its search results
come from a network of independent peers. In such a distributed network,
no single entity decides what gets listed, or in which order results appear.

The YaCy search engine runs on each user's own computer. Search terms are
hashed before they leave the user's computer. Different from conventional
search engines, YaCy is designed to protect the users' privacy.
A user's computer can create with YaCy its individual search indexes and
rankings, so that results better match what the user is looking for over time.
YaCy also makes it easy to create a customized search portal with a few clicks.

Each YaCy user is either part of a large search network (YaCy contains a
peer-to-peer network protocol to exchange search indexes with other YaCy
search engine installations) or the user runs YaCy to produce
a personal search portal that can be either public or private.

YaCy search portals can also be placed in intranet environment which makes
YaCy a replacement for commercial enterprise search solutions. A network
scanner makes it easy to discover all available http, ftp and smb servers.

To create a web index, YaCy has a web crawler for 
everybody, without censorship and central data retention:
- search the web (automatically using all other YaCy peers)
- co-operative crawling; support for other crawlers
- intranet indexing and search
- set up your own search portal
- all users have equal rights
- comprehensive concept to anonymise the users' index

To be able to perform a search using the YaCy network, every user has to
set up their own node. More users are leading to higher index capacity
and better distributed indexing performance.


## License

YaCy is published under the GPL v2
The source code is inside the release package (see /source and /htroot).


## Where is the documentation?

Documentation can be found at:
- (Home Page)        https://yacy.net/
- (German Forum)     http://forum.yacy.de/
- (Wiki:de)          http://www.yacy-websuche.de/wiki/index.php/De:Start
- (Wiki:en)          http://www.yacy-websearch.net/wiki/index.php/En:Start
- (Tutorial Videos)  https://yacy.net/en/Tutorials.html and https://yacy.net/de/Lehrfilme.html

Every of these locations has a (YaCy) search functionality which combines
all these locations into one search result.


## Dependencies? What other software do I need?

You need Java 1.8 or later to run YaCy, nothing else (Java 1.7 can still be used to run the main [1.92/9000 release](https://github.com/yacy/yacy_search_server/releases/tag/Release_1.92) )
Please download it from https://www.java.com

YaCy also runs on Iced Tea 3.
See https://icedtea.classpath.org

NO OTHER SOFTWARE IS REQUIRED!
(you don't need apache, tomcat or mysql or whatever)


## How do I start this software?

Startup and Shutdown of YaCy:

- on GNU/Linux and OpenBSD:
   - to start: execute `./startYACY.sh`
   - to stop : execute `./stopYACY.sh`

- on Windows:
   - to start: double-click `startYACY.bat`
   - to stop : double-click `stopYACY.bat`

- on Mac OS X:
please use the Mac Application and start or stop it like any
other Mac Application (double-click to start)


## How do I use this software, where is the administration interface?

YaCy is a build on a web server. After you started YaCy,
start your browser and open

   http://localhost:8090

There you can see your personal search and administration interface.


## What if I install YaCy (headless) on a server?

You can do that but YaCy authorizes users automatically if they
access the server from the localhost. After about 10 minutes a random
password is generated and then it is not possible to log in from
a remote location. If you install YaCy on a server that is not your
workstation, then you must set an administration account immediately
after the first start-up. Open:

http://<remote-server-address>:8090/ConfigAccounts_p.html

and set an administration account.

## Can I run YaCy in a virtual machine or a container?

YaCy runs fine in virtual machines managed by software such as VirtualBox or VMware. 

Container technology may be more flexible and lightweight and also works fine with YaCy.

These technologies can either be deployed locally, on remote machines you own, or in the 'cloud'. Decide what fits the most your privacy requirements.

### Docker

Deploy easily YaCy on a Docker cloud provider of your choice (can be a machine you own) with the deploy button at the top of this page.

More details for YaCy with Docker in [docker/Readme.md](docker/Readme.md).

### Heroku

Deploy easily on [Heroku](https://www.heroku.com/) PaaS (Platform as a service) provider using the deploy button at the top.

More details for YaCy on Heroku in [Heroku.md](Heroku.md).


## Port 8090 is bad, people are not allowed to access that port

You can forward port 80 to 8090 with iptables:
```bash
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8090
```

On some operation systems, you must first enable access to the ports you are using like:
```bash
iptables -I INPUT -m tcp -p tcp --dport 8090 -j ACCEPT
```

## How can I scale this; how much ram is needed; disk space?

YaCy can scale up to many millions of web pages in your own search index.
The default assignment of RAM is 600MB which is assigned to the java
process but not permanently used by it. The GC process will free the memory
once in a while. If you have a small index (i.e. about 100000 pages)
then you may assign _less_ memory (i.e. 200MB) but if your index scales
up to over 1 million web pages then you should start to increase the
memory assignment. Open http://localhost:8090/Performance_p.html
and set a higher/lower memory assignment.
If you have millions of web pages in your search index then you might
have gigabytes of disk space allocated. You can reduce the disk
space i.e. setting the htcache space to a different size; to do that
open http://localhost:8090/ConfigHTCache_p.html and set a new size.


## Join the development!

YaCy was created with the love of a community.
A large number of programmers have helped, please join us!

Here is a rough hint how to start developing YaCy in eclipse:

- Clone https://github.com/yacy/yacy_search_server.git
- File -> Import as Git -> Projects from Git -> Existing local repository
- -> add -> your git clone of yacy_search_server
- "Import existing Eclipse projects" -> finish
- Run -> External Tools -> External Tools Configuration -> double-click Ant Build
- -> Name: "YaCy Build" -> Buildfile: Browse Workspace -> build.xml -> Run
- In Package Explorer, right-click on yacy -> Run as -> Java Application -> Select "yacy - net.yacy" -> Ok

To join our development community, got to https://searchlab.eu

If you implemented something amazing we welcome your pull request at https://github.com/yacy/yacy_search_server


## How to get the source code and how to compile YaCy yourself?

The source code is inside every YaCy release. You can also get YaCy
from https://github.com/yacy/yacy_search_server by cloning the repository

```
git clone https://github.com/yacy/yacy_search_server
```

Please clone our code and help with development!
The code is licensed under the GPL v2.

Compiling YaCy:
- you need Java 1.8 or later and [Apache Ant](https://ant.apache.org/)
- just compile: "ant clean all" - then you can "./startYACY.sh" or "./startYACY.bat"
- create a release tarball: "ant dist"
- create a Mac OS release: "ant distMacApp" (works only on a Mac)
- create a debian release: "ant deb"
- work with eclipse: within eclipse you also need to start the ant build process
  because the servlet pages are not compiled by the eclipse build process
after the dist procedure, the release can be found in the RELEASE subdirectory

Build with Maven:
- for the first time goto subdirectory libbuild (which contains the maven parent pom)
- compile with "mvn clean install -DskipTests", this will create all needed modules
- after above you can use just the pom in the main directory to build YaCy with maven

## Are there any APIs or how can I attach software at YaCy?

There are many interfaces build-in in YaCy and they are all based on http/xml and
http/json. You can discover these interfaces if you notice the orange "API" icon in
the upper right of some web pages in the YaCy web interface. Just click on it and
you will see the xml/json version of the information you just have seen at the web
page.
A different approach is the usage of the shell script provided in the /bin
subdirectory. The shell scripts also call the YaCy web interface. By cloning some of those
scripts you can create more shell api access methods easily.

## Contact

Our primary point of contact is the international YaCy forum at https://searchlab.eu
We encourage you to start a discussion there in your own language.

If you have any questions, please do not hesitate to contact the maintainer:
Send an email to Michael Christen (mc@yacy.net) with a meaningful subject
including the word 'yacy' to prevent that your email gets stuck
in my anti-spam filter.

If you like to have a customized version for special needs,
feel free to ask the author for a business proposal to customize YaCy
according to your needs. We also provide integration solutions if the
software is about to be integrated into your enterprise application.

Germany, Frankfurt a.M., 26.11.2011
Michael Peter Christen
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`# YaCy`
Added a link to YaCy Gitter chat room in Readme As suggested by PR #159 and reminded by @ivanhercaz in issue #247 6 years ago			`[![Gitter](https://badges.gitter.im/yacy/yacy_search_server.svg)](https://gitter.im/yacy/yacy_search_server?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)`
Added a Travis build status image to Readme 7 years ago			`[![Build Status](https://travis-ci.org/yacy/yacy_search_server.svg?branch=master)](https://travis-ci.org/yacy/yacy_search_server)`
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago
			`[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy)`

			`## What is this?`

Updated Git links from Gitorious to Github. 10 years ago			`YaCy is a search engine software. It takes a new approach to search`
			`because it does not use a central server. Instead, its search results`
			`come from a network of independent peers. In such a distributed network,`
			`no single entity decides what gets listed, or in which order results appear.`

			`The YaCy search engine runs on each user's own computer. Search terms are`
			`hashed before they leave the user's computer. Different from conventional`
			`search engines, YaCy is designed to protect the users' privacy.`
			`A user's computer can create with YaCy its individual search indexes and`
			`rankings, so that results better match what the user is looking for over time.`
			`YaCy also makes it easy to create a customized search portal with a few clicks.`

			`Each YaCy user is either part of a large search network (YaCy contains a`
			`peer-to-peer network protocol to exchange search indexes with other YaCy`
			`search engine installations) or the user runs YaCy to produce`
			`a personal search portal that can be either public or private.`

			`YaCy search portals can also be placed in intranet environment which makes`
			`YaCy a replacement for commercial enterprise search solutions. A network`
			`scanner makes it easy to discover all available http, ftp and smb servers.`

			`To create a web index, YaCy has a web crawler for`
			`everybody, without censorship and central data retention:`
			`- search the web (automatically using all other YaCy peers)`
			`- co-operative crawling; support for other crawlers`
			`- intranet indexing and search`
			`- set up your own search portal`
			`- all users have equal rights`
			`- comprehensive concept to anonymise the users' index`

			`To be able to perform a search using the YaCy network, every user has to`
			`set up their own node. More users are leading to higher index capacity`
			`and better distributed indexing performance.`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## License`

Updated Git links from Gitorious to Github. 10 years ago			`YaCy is published under the GPL v2`
			`The source code is inside the release package (see /source and /htroot).`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Where is the documentation?`

Updated Git links from Gitorious to Github. 10 years ago			`Documentation can be found at:`
Update HTTP -> HTTPS in README.md URLs were updated to use HTTPS protocol in README.md. 7 years ago			`- (Home Page) https://yacy.net/`
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`- (German Forum) http://forum.yacy.de/`
			`- (Wiki:de) http://www.yacy-websuche.de/wiki/index.php/De:Start`
			`- (Wiki:en) http://www.yacy-websearch.net/wiki/index.php/En:Start`
Update HTTP -> HTTPS in README.md URLs were updated to use HTTPS protocol in README.md. 7 years ago			`- (Tutorial Videos) https://yacy.net/en/Tutorials.html and https://yacy.net/de/Lehrfilme.html`
Updated Git links from Gitorious to Github. 10 years ago
			`Every of these locations has a (YaCy) search functionality which combines`
			`all these locations into one search result.`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Dependencies? What other software do I need?`

Updated Java version information on Readme 7 years ago			`You need Java 1.8 or later to run YaCy, nothing else (Java 1.7 can still be used to run the main [1.92/9000 release](https://github.com/yacy/yacy_search_server/releases/tag/Release_1.92) )`
Update HTTP -> HTTPS in README.md URLs were updated to use HTTPS protocol in README.md. 7 years ago			`Please download it from https://www.java.com`
Updated Git links from Gitorious to Github. 10 years ago
Updated Java version information on Readme 7 years ago			`YaCy also runs on Iced Tea 3.`
Update HTTP -> HTTPS in README.md URLs were updated to use HTTPS protocol in README.md. 7 years ago			`See https://icedtea.classpath.org`
Updated Git links from Gitorious to Github. 10 years ago
			`NO OTHER SOFTWARE IS REQUIRED!`
			`(you don't need apache, tomcat or mysql or whatever)`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## How do I start this software?`

Updated Git links from Gitorious to Github. 10 years ago			`Startup and Shutdown of YaCy:`

Readme improvements Now GitHub should display it properly. Also, added OpenBSD. 10 years ago			`- on GNU/Linux and OpenBSD:`
Improved formatting of markdown 6 years ago			- to start: execute `./startYACY.sh`
			- to stop : execute `./stopYACY.sh`
Updated Git links from Gitorious to Github. 10 years ago
			`- on Windows:`
Improved formatting of markdown 6 years ago			- to start: double-click `startYACY.bat`
			- to stop : double-click `stopYACY.bat`
Updated Git links from Gitorious to Github. 10 years ago
			`- on Mac OS X:`
			`please use the Mac Application and start or stop it like any`
Fix some typos in the README. 7 years ago			`other Mac Application (double-click to start)`
Updated Git links from Gitorious to Github. 10 years ago

Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## How do I use this software, where is the administration interface?`

Updated Git links from Gitorious to Github. 10 years ago			`YaCy is a build on a web server. After you started YaCy,`
			`start your browser and open`

			`http://localhost:8090`

			`There you can see your personal search and administration interface.`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## What if I install YaCy (headless) on a server?`

Updated Git links from Gitorious to Github. 10 years ago			`You can do that but YaCy authorizes users automatically if they`
			`access the server from the localhost. After about 10 minutes a random`
			`password is generated and then it is not possible to log in from`
			`a remote location. If you install YaCy on a server that is not your`
			`workstation, then you must set an administration account immediately`
			`after the first start-up. Open:`

			`http://<remote-server-address>:8090/ConfigAccounts_p.html`

			`and set an administration account.`

Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Can I run YaCy in a virtual machine or a container?`

Added virtual machines and containers block 9 years ago			`YaCy runs fine in virtual machines managed by software such as VirtualBox or VMware.`

			`Container technology may be more flexible and lightweight and also works fine with YaCy.`

Added Heroku markdown documentation. 8 years ago			`These technologies can either be deployed locally, on remote machines you own, or in the 'cloud'. Decide what fits the most your privacy requirements.`
Added virtual machines and containers block 9 years ago
Added Heroku markdown documentation. 8 years ago			`### Docker`

			`Deploy easily YaCy on a Docker cloud provider of your choice (can be a machine you own) with the deploy button at the top of this page.`

Fixed relative Markdown links 8 years ago			`More details for YaCy with Docker in [docker/Readme.md](docker/Readme.md).`
Added Heroku markdown documentation. 8 years ago
			`### Heroku`

			`Deploy easily on [Heroku](https://www.heroku.com/) PaaS (Platform as a service) provider using the deploy button at the top.`

Fixed relative Markdown links 8 years ago			`More details for YaCy on Heroku in [Heroku.md](Heroku.md).`
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago

			`## Port 8090 is bad, people are not allowed to access that port`
Updated Git links from Gitorious to Github. 10 years ago
			`You can forward port 80 to 8090 with iptables:`
Improved formatting of markdown 6 years ago			```bash
Updated Git links from Gitorious to Github. 10 years ago			`iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8090`
Improved formatting of markdown 6 years ago			```

Updated Git links from Gitorious to Github. 10 years ago			`On some operation systems, you must first enable access to the ports you are using like:`
Improved formatting of markdown 6 years ago			```bash
Updated Git links from Gitorious to Github. 10 years ago			`iptables -I INPUT -m tcp -p tcp --dport 8090 -j ACCEPT`
Improved formatting of markdown 6 years ago			```
Updated Git links from Gitorious to Github. 10 years ago
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## How can I scale this; how much ram is needed; disk space?`

Updated Git links from Gitorious to Github. 10 years ago			`YaCy can scale up to many millions of web pages in your own search index.`
			`The default assignment of RAM is 600MB which is assigned to the java`
			`process but not permanently used by it. The GC process will free the memory`
			`once in a while. If you have a small index (i.e. about 100000 pages)`
			`then you may assign _less_ memory (i.e. 200MB) but if your index scales`
			`up to over 1 million web pages then you should start to increase the`
			`memory assignment. Open http://localhost:8090/Performance_p.html`
			`and set a higher/lower memory assignment.`
			`If you have millions of web pages in your search index then you might`
Fix some typos in the README. 7 years ago			`have gigabytes of disk space allocated. You can reduce the disk`
Updated Git links from Gitorious to Github. 10 years ago			`space i.e. setting the htcache space to a different size; to do that`
			`open http://localhost:8090/ConfigHTCache_p.html and set a new size.`


Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Join the development!`
Updated Git links from Gitorious to Github. 10 years ago
updated compilation with eclipse and links to forum 6 years ago			`YaCy was created with the love of a community.`
			`A large number of programmers have helped, please join us!`

			`Here is a rough hint how to start developing YaCy in eclipse:`

			`- Clone https://github.com/yacy/yacy_search_server.git`
			`- File -> Import as Git -> Projects from Git -> Existing local repository`
			`- -> add -> your git clone of yacy_search_server`
			`- "Import existing Eclipse projects" -> finish`
			`- Run -> External Tools -> External Tools Configuration -> double-click Ant Build`
			`- -> Name: "YaCy Build" -> Buildfile: Browse Workspace -> build.xml -> Run`
			`- In Package Explorer, right-click on yacy -> Run as -> Java Application -> Select "yacy - net.yacy" -> Ok`

			`To join our development community, got to https://searchlab.eu`

			`If you implemented something amazing we welcome your pull request at https://github.com/yacy/yacy_search_server`
Updated Git links from Gitorious to Github. 10 years ago

Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## How to get the source code and how to compile YaCy yourself?`

Updated Git links from Gitorious to Github. 10 years ago			`The source code is inside every YaCy release. You can also get YaCy`
Fixes #16 Updates documentation about cloning and build from source 8 years ago			`from https://github.com/yacy/yacy_search_server by cloning the repository`

			```
			`git clone https://github.com/yacy/yacy_search_server`
			```

Updated Git links from Gitorious to Github. 10 years ago			`Please clone our code and help with development!`
			`The code is licensed under the GPL v2.`

			`Compiling YaCy:`
Update HTTP -> HTTPS in README.md URLs were updated to use HTTPS protocol in README.md. 7 years ago			`- you need Java 1.8 or later and [Apache Ant](https://ant.apache.org/)`
Updated compiling section in Readme 8 years ago			`- just compile: "ant clean all" - then you can "./startYACY.sh" or "./startYACY.bat"`
Updated Git links from Gitorious to Github. 10 years ago			`- create a release tarball: "ant dist"`
			`- create a Mac OS release: "ant distMacApp" (works only on a Mac)`
			`- create a debian release: "ant deb"`
			`- work with eclipse: within eclipse you also need to start the ant build process`
			`because the servlet pages are not compiled by the eclipse build process`
Fix some typos in the README. 7 years ago			`after the dist procedure, the release can be found in the RELEASE subdirectory`
Updated Git links from Gitorious to Github. 10 years ago
Update master lng file with added text in Settings_ServerAccess remove outdated file entry in fr.lng & sk.lng 8 years ago			`Build with Maven:`
Add hint how to build with maven (for the first time) to readme 8 years ago			`- for the first time goto subdirectory libbuild (which contains the maven parent pom)`
			`- compile with "mvn clean install -DskipTests", this will create all needed modules`
			`- after above you can use just the pom in the main directory to build YaCy with maven`

Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Are there any APIs or how can I attach software at YaCy?`
Updated Git links from Gitorious to Github. 10 years ago
			`There are many interfaces build-in in YaCy and they are all based on http/xml and`
			`http/json. You can discover these interfaces if you notice the orange "API" icon in`
			`the upper right of some web pages in the YaCy web interface. Just click on it and`
			`you will see the xml/json version of the information you just have seen at the web`
			`page.`
			`A different approach is the usage of the shell script provided in the /bin`
Grammar change (#258) Grammar change to Readme APIs section 6 years ago			`subdirectory. The shell scripts also call the YaCy web interface. By cloning some of those`
			`scripts you can create more shell api access methods easily.`
Updated Git links from Gitorious to Github. 10 years ago
Added a one-click deploy button for Heroku Transformed main YaCy readme from mediawiki to Markdown format for better support on GitHub and easier deploy buttons integration. 8 years ago			`## Contact`
Updated Git links from Gitorious to Github. 10 years ago
updated compilation with eclipse and links to forum 6 years ago			`Our primary point of contact is the international YaCy forum at https://searchlab.eu`
			`We encourage you to start a discussion there in your own language.`
Added Readme links to Gitter and Matrix rooms Fixes issue #268 6 years ago
Updated Git links from Gitorious to Github. 10 years ago			`If you have any questions, please do not hesitate to contact the maintainer:`
			`Send an email to Michael Christen (mc@yacy.net) with a meaningful subject`
			`including the word 'yacy' to prevent that your email gets stuck`
			`in my anti-spam filter.`

			`If you like to have a customized version for special needs,`
			`feel free to ask the author for a business proposal to customize YaCy`
			`according to your needs. We also provide integration solutions if the`
			`software is about to be integrated into your enterprise application.`

			`Germany, Frankfurt a.M., 26.11.2011`
			`Michael Peter Christen`