- separated sidebars in new search interface and placed them in their own files
which can be put in into the search page like plug-ins
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4529 6c8d7289-2bf4-0310-a012-ef5d649a1542
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
the very basic kelondroRowCollection which supports sorting with a second thread
in case that a double-core processing CPU is used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
the history distinguishes between different users and identifies them by their ip
a history is only shown to the user who submitted the search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4510 6c8d7289-2bf4-0310-a012-ef5d649a1542
An alle JavaScript-Profis: die Anwahl einer neuen Suchseite über die Selectbox (hinter Category)
funktioniert bei mir nur mit Firefox, nicht aber mit Safari oder Camino. Kann da mal jemand
gucken woran das liegt?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4508 6c8d7289-2bf4-0310-a012-ef5d649a1542
- enhanced text parser (condenser): found and eliminated bad code parts; increase of speed
- added handling of image preview using the image cache from HTCACHE
- some other minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4507 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added image domain presentation to image preview
- added new search page to menu
- added automatic re-search when an old search profile is requested and a crawl is ongoing,
to fetch newly crawled entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4501 6c8d7289-2bf4-0310-a012-ef5d649a1542
This is performed using a 3-stage sorting process:
- sort by relevance, then do snippet-fetch
- sort snippets by relevance then do image link extraction
- sort image links by image size; unknown sizes are handled like small sizes
- only the exact amount of images as requested are shown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4499 6c8d7289-2bf4-0310-a012-ef5d649a1542
- introduced generalized method to organize ranked results (2 new classes)
- added a post-ranking after snippet-fetch (before: only listed) using the new ranking data structures
- fixed some missing data fields in RWI ranking attributes and correct hand-over between data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4498 6c8d7289-2bf4-0310-a012-ef5d649a1542
they appear as separate, floating window above the search results,
not in a new window
- added highslide javascript library for feature mentioned above
- removed dir servlet. This thing was not used as it was supposed to be (as an example applet)
and was a major problem for intranet-indexing when files are hosted on the same peer.
- added yacy-httpd-internal directory listing. Because YaCy is a search engine,
directory listings are similar to search result listings. Intranet indexing from the same peer
will get nice index pages for document collections.
- removed unused test applet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4494 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed md5 file generation (spoils the won repository)
- removed comments in file share (was never used)
- moved dir list comparator to other place (maybe solves problem, lets see)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4481 6c8d7289-2bf4-0310-a012-ef5d649a1542
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
this was necessary because othervise robinson peers did also global searches, which cannot be a wanted effect
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4456 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) trailing and double slashs in paths are removed, they are not only ugly, but also caused infinite loops
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4373 6c8d7289-2bf4-0310-a012-ef5d649a1542
- refactoring of plasmaParserDocument to use Dublin Core - compatible property names
- redesign of url handling in parser and condenser (less String-to-yacyURL conversion)
- more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542
This data structure replaces almost all files in the PLASMA directory
also the collection.index and the LURL-db will be created as Eco-DB, if it does not exist before
existing Flex-databases will be used as they are (the is no data lost)
If you want to force the creation of a Eco-collection.index, simply delete the old index.
The Eco file system will only be used if there is enough memory.
The collection.index RAM limit is 200MB, if you have less, a flex-Table is createt.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4340 6c8d7289-2bf4-0310-a012-ef5d649a1542
The blog still has a major problem: entries are displayed in random(?) order if there are several entries in the blog
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4339 6c8d7289-2bf4-0310-a012-ef5d649a1542
this did not work in 0.56 main (but should).
Therefore it will be necessery to provide a hand-made 'virtual pro' (just renaming) release 0.57
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4326 6c8d7289-2bf4-0310-a012-ef5d649a1542
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed yacy logo
- removed crawlOrder protocol (unused)
- removed file index in kelondroFlex (will not work, it takes too long to maintain)
- fixed remoted crawl for clusters (now denies remote crawls from peers outside cluster)
- 0.562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4317 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Anpassungen für die Auswahlbox-Option "selected"
- Einschränkung der Tagwolke aus Tag-Selektion
- Rundung von font-size auf zwei Nachkommastellen
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4315 6c8d7289-2bf4-0310-a012-ef5d649a1542
a new release type was added: 'embedded' which is the same as the current standard release was
this will not have any effect to the next release 0.56, which will still a pro-release on public download
the transition the the new release strategy must be done now to enable automatic update by the updated in future releases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4287 6c8d7289-2bf4-0310-a012-ef5d649a1542
- remove direct accesses to SimpleDateFormat fields in serverDate and use the static parse... methods instead
- remove nowDate() as a Date doesn't store timezone information and a new Date() is always faster
- default formatter methods use a GMT timezone by default now, this is important for interchangability as some date formats we use don't include a timezone offset.
- continued renaming and rearanging (formatter) methods. all should follow the general naming scheme formatWHAT(...)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4285 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed color mode (replace) for banner
- changed default color (fits to default skin) of banner in status
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4255 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) Logo background is transparent now. (Using pixel at (0,0) to determine which color is transparent. Too dirty?)
*) Logo is loaded through filesystem instead of HTTP now.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4247 6c8d7289-2bf4-0310-a012-ef5d649a1542
Which path do I have to set for IMAGE (htroot/env/grafics/yacy.gi) if I want to load it through the file system and not via HTTP?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4244 6c8d7289-2bf4-0310-a012-ef5d649a1542
- instead of pushing urls to other peers, the urls are actively pulled
by the peer that wants to do a remote crawl
- the remote crawl push process had been removed
- a process that adds urls from remote peers had been added
- the server-side interface for providing 'limit'-urls exists since 0.55 and works with this version
- the list-interface had been removed
- servlets using the list-interface had been removed (this implementation did not properly manage double-check)
- changes in configuration file to support new pull-process
- fixed a bug in crawl balancer (status was not saved/closed properly)
- the yacy/urls-protocol was extended to support different networks/clusters
- many interface-adoptions to new stack counters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4232 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the index administration now uses the same code base for url selection and collection
as the search interface. The index administration is therefore a good test environment for
ranking order control
- removed old postsorting-algorithms, will be replaced with new one
- fixed many bugs occurred before during ranking; especially the contraint filtering method
removed too many links
- fixed media search flags; had been attached to too many urls. The effect should be a better
pre-sorting before media load within snippet fetch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4223 6c8d7289-2bf4-0310-a012-ef5d649a1542
to do this, the index administration interface was extended.
It is now possible to select parts of a index.
See properties shown in interface after a word search for details.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4218 6c8d7289-2bf4-0310-a012-ef5d649a1542
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.
- exceptions (hardcoded):
DATA/LOG/yacy.logging
DATA/SETTINGS/httpProxy.conf
DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.
- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
- final flush only when tabletype = RAM
- prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100
- number of maximun entries in stacker is configurable in yacy.init (stacker.slots)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls
After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '<') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.
* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).
TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542