yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	5fd990cc84	sorry, bad commit! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4527 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	275a226cc5	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	fbe335db73	consistent use of de.anomic.server.serverMemory to get information about memory statistics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4522 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1dce2f1079	more multithreading support: - replaced some synchronized classes by classes from util.concurrent - used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in the very basic kelondroRowCollection which supports sorting with a second thread in case that a double-core processing CPU is used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	a981cd5ab7	ignore folder tags...on request of daburna git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4436 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9d693ee635	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	2dc994515d	fixed a typo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4396 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	d59f9e0c17	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4394 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	7b400756c4	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4393 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	30bf4bdc48	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4392 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	b32b025d4b	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4391 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	29146ff855	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4390 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	6dc319fc32	UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4389 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	3afdcd0d59	fixed problem with utf-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4388 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	13668830b7	fixed problems with utf-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4387 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	34e5422675	adjusted code for bookmarksDB.getFolderList() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4386 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
apfelmaennchen	6f9f821481	added XBEL Export for YaCy Bookmarks. Tags are strored as <metadata owner="Mozilla" ShortcutURL="tag1,tag2"/> git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4381 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f7c5ccedc7	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4301 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	1cb6e431a6	Replace the ISO8601 aka W3C datetime parser by one that supports every representation allowed by this standard, see http://www.w3.org/TR/NOTE-datetime - useful expecially for sitemaps parsing, where this date format is used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4286 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	3c30c2da75	more cleanup and API consistency changes, more to come... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4284 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	89b9b2b02a	redesigned remote crawl process: - instead of pushing urls to other peers, the urls are actively pulled by the peer that wants to do a remote crawl - the remote crawl push process had been removed - a process that adds urls from remote peers had been added - the server-side interface for providing 'limit'-urls exists since 0.55 and works with this version - the list-interface had been removed - servlets using the list-interface had been removed (this implementation did not properly manage double-check) - changes in configuration file to support new pull-process - fixed a bug in crawl balancer (status was not saved/closed properly) - the yacy/urls-protocol was extended to support different networks/clusters - many interface-adoptions to new stack counters git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4232 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	55c87b3b12	changed behavior of crawl stacker - final flush only when tabletype = RAM - prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100 - number of maximun entries in stacker is configurable in yacy.init (stacker.slots) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a31b9097a4	preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls After removal of robots.txt checking from stacker threads, the multi-threading of this process is void. Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since creation of these threads is not resource-consuming, for a detailed explanation see svn 4106 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	0e1738899f	* Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '<') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values. * added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456 * removed duplicate code (mostly related to the big changes above). TODO: - make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 - probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting. - further improve the speed of page creation for the WatchCrawler. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	01e0669264	re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	5b0c1449e1	various fixes and cleanups for blacklist handling: 1. avoid adding duplicate file name entries in config properties for lists, 2. correctly merge all path masks from all list files for the same host masks, 3. rewrite helper methods standard java methods for Collection transformations, 4. merged various methods with identical functionality for different Collection implementations into one, 5. minor refactoring to improve code readability. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4779f314fe	first version of next-generation search interface: - snippets are not fetched by browser using ajax, they are now fetched internally - YaCy-internat threads control existence of snippets and sort out bad results - search results are prepared using SSI includes - the search result page is visible right after the search request, the results drop in when they are detected - no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers - added result page switching! after the first 10 results, the next page can be retrieved - number of remote results is updated online on the result page as they drop in - removed old snippet servelet (which had been also a security leak btw) - media search is broken now, will be redesigned and fixed in another step git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d9472b6a3a	* fixed problem with watch crawler * added new column to network table (remote crawl urls): the new value for provided URLs will be used for new remote crawl method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4061 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e332b844b2	- enhanced remote search: during waiting time for remote crawls some urls are fetched so the url cache can be filled with these urls - the url-prefetch is used to sort out some unresolved urls - the snippet-fetcher is triggered with the search event id. This is used to remove missing snippets from the search cache so they will not be displayed again git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4060 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b3c830271c	fix in xml header git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4057 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	947fc46904	refactoring of search process: - re-designed remote request result processing - re-designed local result accumulation, will be further enhanced with snippet fetcher - removed search process handling in switchboad - made snippet class static (there is no need for multiple snippet objects) - removed some redundant tasks in server-side search process, should be a little bit faster now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4043 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	62347b50f4	added security layer for ViewImage: - images may be requested by localhost and authorized users only, if the request is done using a clear-text URL - the image may be requested also using a code that can be a license to retrieve a URL for everyone - some servelets produce URL licenses for ViewImage, like image search results git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4027 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9ca46a8c69	indexing of local (intranet) urls enabled To do this, one must create a separate YaCy network that has a local URL domain A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	511dcbb172	fixed encoding bug made in SVN 3993 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3998 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	40b0547611	- documentaton changes (removed old forum links) - different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a4e8ad95ab	enhancements to news and switchboard queue processing removed direct access and replaced by iteration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3961 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	36a37f758b	fix for oom exception during release download see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=101&hilit= git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3950 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	71ca9aa6d4	- fix for changed blacklist types git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3857 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	339153d40e	*) favicons that are specified in the document content via html link-tags are now detected and displayed on the search page (requested by allo). git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3845 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	051a65f7af	) Snippet fetching: Snippet are now fetched synchronous if the query parameter "fetchSnippet=" is appended to the query string on the yacy search page. This is required for the RSS feed. See: http://www.yacy-forum.de/viewtopic.php?t=4051 ) Small changes in the XSLT-stylesheet that is used to generate a html page from the RSS feed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3787 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	5fc00871a9	getpageinfo/sitemap bugfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3781 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	e7da3d2340	fixed sitemap url in getpageinfo added suggested tags/keywords in getpageinfo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3780 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
(no author)	92351c4dcb	*) SOAP: bookmarks list now indicates if a bookmark is private (requested by KoH) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3775 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a585b4d41b	added web structure image see http://localhost:8080/WatchWebStructure_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3747 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	33ad0c8246	added a web structure computation and logging: - all web page parsing operations will now increase a web structure file - the file is computed in memory and dumped at shutdown-time to PLASMASB/webStructure.map in readable form (not a database) - the file can be used externally to analyse the link structure of the crawled pages - the web structure can also be retrieved using a xml-interface at http://localhost:8080/xml/webstructure.xml - the short-term purpose is the computation of a link-graph image (before linuxtag!) - a long-term purpose could be a decentralized computation of the citation rank git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3746 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	601fc7d1c5	- added source to J7Zip-modifed.jar and it's license (changelog is still to come) - moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools - prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	7d9259e44d	*) Bugfix for umlaut problem See: http://www.yacy-forum.de/viewtopic.php?t=3932 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3674 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	0b5fc3c28c	) moving date functions to serverDate class ) Sitemap-parser - logging added - parsing of modDate added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3667 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	6f46245a51	) Bookmarks: Ajax icon is displayed while loading title ) First version of a sitemap parser added - currently only autodetection of sitemap files is supported *) DB-Import restructured - pause/resume should work again now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3

132 Commits (ba622bb24032278d7f0c078acc978eef164c42b3)