build.xml: Fixed check for existing private.key, added check for non existing release in target sign and changed the include filenames for changed libs
Added log4j.properties file to eliminate the warning about a not initialized log4j subsystem with parameters for one console appender
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6998 6c8d7289-2bf4-0310-a012-ef5d649a1542
de.lng: Updated German translation for additional String in ConfigUpdate_p.html
XHTML 1.0 Strict fixes for all the other .html files
yacy/ui/css/yacyui-portalsearch.css: added .hidden class that was removed from ConfigProperties_p.html
Switchboard.java: Added URL for thread Remote Crawl Job and set URL for Remote Crawl URL Loader to null to fix empty href=""
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6996 6c8d7289-2bf4-0310-a012-ef5d649a1542
DictionaryLoader_p.html: Filled <dt> elements to eliminate warnings
Moved CSS for portalsearch field from header to metas template because it belongs in the <head>er
yacui-portalsearch.css Added #yacylivesearch form { display: inline; } because HTML 1.0 Strict does not allow <form><input> and the added <p> would otherwise provoke a line break
de.lng: Updates translations for added <dt> elements and deactivated statement in DictionaryLoader_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6994 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added type="text/javascript" to script resource
- removed unintentional "\" from <a> link
- changed "name" tag in <form> element to "id" for XHTML 1.0 Strictness
(remaining warnings come from script elements writing end tags like </tr> that might confuse some validators)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6990 6c8d7289-2bf4-0310-a012-ef5d649a1542
monitoring: replaced unused 'idletime' by uploading bytes
added some kind of 'upload-throttling' at dht-out :-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6983 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added empty action tag to form
- replaced name tags with id (name is not a valid tag in XHTML 1.0 Strict)
- changed label for target (so now clicking on the labels also activates the checkboxes)
de.lng: Test with Subversion properties #2
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6982 6c8d7289-2bf4-0310-a012-ef5d649a1542
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
- a site-operation heuristic that loads all direct links from a portal page if the site-operator is used
- a direct crawl for search results from scroogle for the given search terms
The configuration page can be found directly beside the network configuration page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6951 6c8d7289-2bf4-0310-a012-ef5d649a1542
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
- cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542
Here a new concept called 'search heuristics' is introduced. A heuristic is a kind of 'shortcut' to good results in IT, here for good search results. In this case it will be used to get a very transparent way to compare what YaCy is able to produce as search result and what g**gle produces as search result. Here is what your can do now:
- add the phrase 'heuristic:scroogle' to your search query, like 'oil spill heuristic:scroogle' and then a call to scroogle is made to get anonymous search results from g**gle.
- these results are _not_ taken as meta-search results, but are used to instantly feed a crawling and indexing process. This happens very fast, here 20 results from scroogle are taken and loaded all simultanously, parsed and indexed immediately and from the results of the parsed content the search result is feeded, along to the normal p2p search
- when new results from that heuristic (more to come) get part of the search results, then it is verified if such results are redundant to existing (they had been part of the normal YaCy search result anyway) or if they had been completely new to YaCy.
- in the search results the new search results from heuristics are marked with a 'H ++' and search results from heuristics that had been already found by YaCy are marked with a 'H ='. That means:
- you can now see YaCy and Scroogle search results in one result page but you also see that you would not have 'missed' the g**gle results when you would only have used YaCy.
- to make it short: YaCy now subsumes g**gle results. If you use only YaCy, you miss nothing.
to come: a configuration page that let you configure the usage of heuristics and get this feature by default.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6944 6c8d7289-2bf4-0310-a012-ef5d649a1542
implemented a hint from dulcedo "use site: - operator as crawl start point".
YaCy already was able to search using a site-constraint. This function is now extended with a instant crawling feature.
When you now use the site-operator, then the landing page of the site iand every page that is linked from this page are loaded, indexed and selected for the search result within that search request. When the remote server responds quickly enough, then this process can result in search results during the normal search result preparation .. just in some seconds.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6941 6c8d7289-2bf4-0310-a012-ef5d649a1542
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
Hint: the YaCy search can easily be integrated into the firefox search window:
Just start a search, then open the pop-up menu inside the firefox search input window and select "add search engine"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6935 6c8d7289-2bf4-0310-a012-ef5d649a1542
- a new news db will be created (news1024.db), the old one (news.db) can be deleted
- peers with too large news payload are not ignored any more (they may have been invisible because they had a too large news payload!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6917 6c8d7289-2bf4-0310-a012-ef5d649a1542
- now xms is lower than xmx (lets try what happens)
- removed default path for intranet crawl starts to avoid confusion as seen on linuxtag
- added time-out to upnp request (i have a new router which may need that)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6916 6c8d7289-2bf4-0310-a012-ef5d649a1542
Network.html: shortened some <br /> tags to <br/>
ConfigBasic.html fixed some typo cann for German translation file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6914 6c8d7289-2bf4-0310-a012-ef5d649a1542
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added number of online peers at the last day and the last week
- changed design of statistic table
- network picture now shows exactly those peers that are counted in the statistic overview for one day
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6897 6c8d7289-2bf4-0310-a012-ef5d649a1542
- order locations by (primary) population and (secondary) longitude (reverse ordering, both)
- added population from GeoNames, OpenGeoDB does not have that information
- changed default viewpoint of map to (30,15); shows more land and europe in the center
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6893 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed description text to 'title' entity (subject is a list of keywords and was very messed)
- added ViewFile in location pop-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6891 6c8d7289-2bf4-0310-a012-ef5d649a1542
Added alt tag to page tabs in yacysearch.java for HTML validity
Added new German translations for geo search phrase in de.lng
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6890 6c8d7289-2bf4-0310-a012-ef5d649a1542
- used that to display two layers on map: cities and search result locations
- added many marker grafics for the display of the markers on the map
- some refactoring of the yacy news code plus bugfixes for latest move from Tree to Table data structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6889 6c8d7289-2bf4-0310-a012-ef5d649a1542
- cycle map is default because it looks best at 'world view'
- added control elements to map
- increased map size
- added deletion of search results for each time when a new search is done
- moved search box up and added yacy icon in such a way that the search page looks exaclty the same as the standard search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6885 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added downloader option in DictionaryLoader
- added generalization (interfaces and overarching localization)
- more abstraction using the libraries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6879 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fix for initial generation of crawl profiles (one more reason to remove your crawl profiles)
- more String -> byte[] migration
- more logging for cache store/hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6874 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
- relaxed dublin core parsing: the dc:reference tag may replace dc:identifier if this does not contain a valid url
- parsing of completeRecords number and presentation in the download list of oai import
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6850 6c8d7289-2bf4-0310-a012-ef5d649a1542
- now importing OAI-PMH server list fron two sources
- simultanous import from several servers (even > 2000)
- check buttons on OAI-PMH server list to select multiple servers for import start
- it is possible to select all servers at once for import
- imported XML data is gzipped after import from surrogate reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542