This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) changes in formatting for better readability in BlacklistCleaner_p.java
*) replaced test for necessary Java version (was 1.4.2, is 1.5 now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4756 6c8d7289-2bf4-0310-a012-ef5d649a1542
- made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer)
- introduced another timeout setting (java internal property)
- more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4674 6c8d7289-2bf4-0310-a012-ef5d649a1542
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed connection modes in ftpc
- replaced sort tread pool in row collections by new one using util.concurrent. the old pool had caused blockings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4582 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the default files yacy.init and for the network definition is now moved to the path defaults
- the httpProxy.conf is renamed to yacy.conf
- the DATA/INDEX/PUBLIC is renamed to the actual network nickname, which should be freeworld or sciencenet
more menu entries
- added apfelmaennchens alternative search page to the menu
- added the new thread dump page to the server log menu point as submenu
modifications
- modified the thread dump page: sorting by thread type
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4575 6c8d7289-2bf4-0310-a012-ef5d649a1542
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
the very basic kelondroRowCollection which supports sorting with a second thread
in case that a double-core processing CPU is used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
they appear as separate, floating window above the search results,
not in a new window
- added highslide javascript library for feature mentioned above
- removed dir servlet. This thing was not used as it was supposed to be (as an example applet)
and was a major problem for intranet-indexing when files are hosted on the same peer.
- added yacy-httpd-internal directory listing. Because YaCy is a search engine,
directory listings are similar to search result listings. Intranet indexing from the same peer
will get nice index pages for document collections.
- removed unused test applet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4494 6c8d7289-2bf4-0310-a012-ef5d649a1542
- remove direct accesses to SimpleDateFormat fields in serverDate and use the static parse... methods instead
- remove nowDate() as a Date doesn't store timezone information and a new Date() is always faster
- default formatter methods use a GMT timezone by default now, this is important for interchangability as some date formats we use don't include a timezone offset.
- continued renaming and rearanging (formatter) methods. all should follow the general naming scheme formatWHAT(...)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4285 6c8d7289-2bf4-0310-a012-ef5d649a1542
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.
- exceptions (hardcoded):
DATA/LOG/yacy.logging
DATA/SETTINGS/httpProxy.conf
DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.
- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls
After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
- SSIs may now refer to servlets, not only files
- calling a servlet, the servlet/SSI engine is called recursively
- SSIs now work also for non-chunked-encoding supporting clients
This will support the new search page functionality, to show search results
dynamically without using javascript. To test this method, a test page has been added
http://localhost:8080/ssitest.html
..calls dynamicalls 3 servlets, which produce some delays during their execution
please verify that you can see the result step-by-step on your browser
To implement this feature, some refactoring had been taken place, mostly code
had been made static and will execute faster.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4037 6c8d7289-2bf4-0310-a012-ef5d649a1542
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
- within this context: generalized date format handling
- extended Update interface:
* a version lookup can be triggered manually
* a complete lookup + download + re-boot process can be triggered with one click
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added new system update configuration page
- moved system update from status page to system udate page
- moved shutdown and restart from status page to main menu
- added new configuration properties to yacy.init (not yet actively used)
- added some methods to handle new automatic update process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3958 6c8d7289-2bf4-0310-a012-ef5d649a1542
** kann das mal jemand auf seiner linux-platform testen **
** und feed-back geben ob der restart funktionier ? **
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3937 6c8d7289-2bf4-0310-a012-ef5d649a1542