- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
- added visibility of metadata and parser in search results since that shows what YaCy can do in a nice way
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8091 6c8d7289-2bf4-0310-a012-ef5d649a1542
these bookmarks will be used to get a source reference for the search in case of intranet or portal searches.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)
note: the settings and input environment is there with that commit, but the values are not yet evaluated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
The YaCy freeworld network used in this network definition, minor enhancements for the feed of MetaGer were integrated.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7771 6c8d7289-2bf4-0310-a012-ef5d649a1542
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added more properties to solr index
- refactoring
- more constants in switchboard
- fix for some NPEs
- recognition of more images
- removed synchronization in HandleMap (obviously not necessary?)
- added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.
The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.
federated solr storage is switched off by default.
To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/
Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
want to use solr instead of YaCy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal
- added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal
- added configuration of navigation options to ConfigPortal
- added an option to switch off automatic index cleaning in case that a link verification method fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542
This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition.
You can find now in the network definition file:
network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime
and in the yacy.conf file:
remotesearch.maxcount and remotesearch.maxtime
There is currently no web interface to define the client-side remote search attributes, please set them manually
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542
* enable with proxyURL = true
* could be useful to browse specific pages with proxy or use own improvements in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7538 6c8d7289-2bf4-0310-a012-ef5d649a1542
- filtering out too old peers when reading seed lists (limit is now 240 minutes)
- added concurrent host names resolving in front of the http client because the http client uses the java built-in DNS resolve which is not multithreading-safe (i have seen deadlocks in thread dumps showing that this bug in jdk is still there)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7515 6c8d7289-2bf4-0310-a012-ef5d649a1542