yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	474659a71f	- modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order - added option to set minimum crawl delta for domains in balancer - added default values to crawl deltas in yacy.init - added configuration for these deltas in performance queues - enhanced performance setting computation (more time for indexing queue for a faster flush - remote crawling is now enabled during local crawling if indexer has space and time for more links - added database stub for new distributed file system - refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	69aac0d74c	modified the diskUsage class regarding the following two aspects: 1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over. 2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	0c1dc703e4	- set staticIP at startUp - added setting for reduced menu (simpleMenu) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4959 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c998dc6556	- added security functions to flush url and search caches in case that memory is full git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	7feae906aa	- organize imports - removed potential null pointer accesses - removed unnecessary casts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
det	f597185026	Initial import of the resource observer framework git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4892 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e0e7f86f82	some bugfixes for the peer-ping process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4885 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	40d7f485f3	- fixed several NPE bugs - fixed loosing of own seed hash (hopefully) - fixed a bug with crawl start s beginning with (bookmark) files - added better IP recognition during hello process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4882 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2f381b8d7a	- fixed at least two causes for a NPE after a use case switch. A large refactoring was neccessary - added another crawl start option: automatic restriction to sub-path - removed crawlStartSimple and renamed crawl start expert to crawl start (without expert) - some changes to texts in crawl start - added some more deletions when an web index is deleted: delete also queues and robots cache git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4881 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2a604b7402	added superfast search result computation which can be obtained for local search when snippet fetching is disabled. An example search for the rss interface would be: http://localhost:8080/yacysearch.rss?query=yacy&Enter=Search&contentdom=text&count=10&resource=local&verify=false (just add "&verify=false") git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4878 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9bef20b537	- added cleanup for unused server loggings: they are removed after the client had not been seen since one hour - removed configBasic popup trigger when no password is set git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4875 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1a1841392c	small fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4859 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	faed00d75d	added use cases to basic configuration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4831 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4229cd275c	fixed several details about network switching, default password, random password and localhost authentification git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4830 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c1d721dd2d	fix for attacks on localhost-authorized peers from web pages with links to localhost addresses: checking of referer in access git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4828 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	3bd1db776a	implemented switch for admin authorization from localhost: - access is granted for localhost users to administration pages by default - the default setting can be changed in the BasicConfig.html page - if the BasicConfig page was accessed with post and no password was submitted, a random password is generated - a headless installation MUST give a password upon first call of the configuration page, otherwise they will not be able to access it again - if no password is given within 10 minutes after start-up, a random password is generated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4804 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	cfe6790498	- added option to switch between yacy networks, especially between the two default networks (freeworld and intranet), from the ConfigNetwork online interface - to make this possible, a large refactoring and reorganisation of data structures was necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	78087da287	- changed seed file storage to clear text - fixed kill script - fixed saving of seed file (had been corrupted by latest changes) - some refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4799 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	5fde679acb	- fixed problem in performance configuration - extended rss fetch size for rssTerminal git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4798 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	239cc4428d	- better domain graph, faster when more links exist, looks better - new authorization rule: localhost is always authorized for administration. This solves many problems with ajax, and also fixed a problem in rssTerminal - fix bug in RSSFeed which prevented that entries had been recognized as individual, new entries - added reloading/updating of status image on status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4796 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	dd75b3cabc	- patch for bad profiles - time-out when deleting profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4793 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b32736762c	enhanced rssTerminal - 3 lines possible - distinguishing of private and public data, if not authorized only public data is shown - shows now more events, including local searches in clear text if user is logged in - simplyfied peer events - better recognition of 'real' new peers - presentation of peer pings from other peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4771 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	fbb712c669	refactoring: moved importer classes to crawler and plasma package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1689030ee8	refactoring: moved all crawler classes into their own package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d2ba1fd2ab	major step forward to network switching (target is easy switch to intranet or other networks .. and back) This change is inspired by the need to see a network connected to the index it creates in a indexing team. It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder. The remaining YACYDB is superfluous and can be deleted. The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy). The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT). No other functional change has been made. The next steps to enable network switcing are: - shift of crawler tables from PLASMADB into the network (crawls are also network-specific) - possibly shift of plasmaWordIndex code into yacy package (index management is network-specific) - servlet to switch networks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	d4bce6affd	refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	483e9a2066	- shifted tld recognition methods from yacyURL to serverDomains - changed isLocal Property in such a way that it is possible to see if a domain is in the internet (and not intranet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4751 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	32b5b057b9	- modified, simplified old kelondroHTCache object; I believe it should be replaced by something completely new - removed tree data type in kelondroHTCache - added new class kelondroHeap; may be the core for a storage object that will once replace the many-files strategy of kelondroHTCache - removed compatibility mode in indexRAMRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4747 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e024e3b9cf	added new default profiles to distinguish snippet fetch for local and global search the difference is, that a local search will no not cause a re-indexing of loaded pages git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1995faef8d	- refactoring of Colage back-end: move to plasma package - renamed also the plasmaCrawlResults to have a consistent naming for url and image queues - added a double-check for the images - added additional queues for the images: all worse-quality images go there, so the queue can be used also if no sizes are given; no image is lost - added a cleanup for the stacks so they cannot flood the memory git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4722 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	5e3ce46339	- better logging when rejecting a url because it is not in declared domain - more XSS attack protection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4720 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	8313d58ae7	- integrated the collage into the Web Visualization menu - added a counter for the public and private queue on the page (testing..) - fixed wrong public/private categorization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4686 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	82bf9ac1c8	- added Collage servlet from datengrab and modified it: * all images are queued * private/public is respected * inserted into switchboard * added collageQueue class that stores all the queued images git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	959f448e5f	- disabled redirects in proxy (so client sees real path) - added connection stats (only connections currently in use) - remove "old" connections (closed or idle for some time) - synchronized shared parts of proxyHandler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4682 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	5d1fbb25e7	fix for bad deploy: - the name of downloaded release files is adopted if the httpc delivers uncompressed tar.gz files (the .gz is removed from the file name) - the deploy method is able to handle tar-file (not tar.gz-files) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4679 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2c1c3bb6eb	- some refactoring (sorry Daniel, hab in deinem Code rumgewütet) - fixed broken downloads (flush was missing) - different problem handling when download is corrupted - different default values in yacy.init git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	14404d31a8	- enhanced performance graph (more info) - added conditions for rarely used logging lines to prevent unnecessary CPU usage for non-printed info git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4667 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	117ae78001	speed enhancement for reading of eco-table indexes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4647 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	5c3c1fdf41	replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	764a40e37d	speed enhancements for crawler and url retrieval (affects also search speed) - concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed - because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed - search speed also profits from LURL-lookup enhancement - changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing - removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4628 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	368593e449	enhanced the concurrency handling of indexing process (better queue size control, better data concept, better shutdown behavior) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4617 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0241d070bc	added concurrency to indexing process: - the methods {parsing, semantic analysis (condensing), structure analysis (web structure)} in the serialized indexing path had been made concurrent. - four BlockingQueues handle concurrency and hand-over of the indexing objects, the last object in the queue is stored into a blockingQueue of maximum size 1 to serialize the process for storage (which uses IO and therefore here should not be deserialized) - a concurrency of (CPUs + 1) is default. Single-CPU users will profil from the change because large files cannot block the indexing process any more. - removed the secondary indexing thread, which is superfluous now. Concurrency is default for all users. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4609 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bca87f1e38	- refactoring of serverThreads: renaming to distinguish busy-threads and blocking-threads - added blockingThreads which are threads that are not driven by pause times but by BlockingQueue lookup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4606 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	968c775025	- preparation of parsing/indexing queue for concurrent execution - remote crawl receipts are now transmitted concurrently in separate threads (makes remove crawls much faster!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4605 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9b0e20fb06	next refactoring step in document indexing to prepare concurrency environment for document parsing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4604 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7f9f639d20	- refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering - refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling - removed unused code parts from condenser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d6050b9ffb	- separated the LURL data storage and Crawl result stack for process supervision. this is another step to enable multiple, concurrent fulltext-indexes - another try to make the yacy-httpc more stable git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	8d6a13bc07	refactoring of parsing-condensing-indexing process: - separated parts - removed storagePeer function next step will be parallelization of processes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4600 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	541b817502	refactoring of switchboard queueing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7150b463ff	changed handling of default values and database paths: - the default files yacy.init and for the network definition is now moved to the path defaults - the httpProxy.conf is renamed to yacy.conf - the DATA/INDEX/PUBLIC is renamed to the actual network nickname, which should be freeworld or sciencenet more menu entries - added apfelmaennchens alternative search page to the menu - added the new thread dump page to the server log menu point as submenu modifications - modified the thread dump page: sorting by thread type git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4575 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9c989fe5f7	fixed deadlock git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4562 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b4ed937f1e	- modified zone navigation (does still not work correctly) - added dht switch in network definition - 0.574 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4550 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bfed9c2da6	- some refactoring in search process - separated sidebars in new search interface and placed them in their own files which can be put in into the search page like plug-ins git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4529 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	253a453413	removed possible synchronization deadlock git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4511 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f4c73d8c68	- fixed highslide usage - some enhancement to index management, better types git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4497 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2327451653	- changed order of database initialisation (index first) - removed mainly unused init-time for databases (was only used for tree tables, which are not used any more) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	61a81820e3	- refactoring of search tracker - added link to search history to repeat the search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4493 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bd63999801	- faster search: using different data structures that avoid multiplr calculations - no more table copy for error-eco table - optional table copy for lurl-entries - more abstractions (less single constant strings) - better logging (using host names instead of ips) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	acf771d5e1	- fixed bug with too much RAM in crawler queue - fixed dir bug - better calculation of TF for join - better waiting-on-result logic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4424 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a8a5df4a51	- more dublin core naming of page metadata - better presentation of result counters in search results git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	15397298dc	- refactoring of indexControlRWIs: moved statics to own class; better Dublin Core naming - fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=759&hilit=&p=4866#p4866 - some bugfixes in EcoTable according remove method - switched more tables to Eco: crawl Profiles, htcache, seeddb, newsdb git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4397 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	efd0b8371a	- added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser - refactoring of plasmaParserDocument to use Dublin Core - compatible property names - redesign of url handling in parser and condenser (less String-to-yacyURL conversion) - more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f4e9ff6ce9	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4343 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	45339c3db5	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4341 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	dc26d6262b	- removed write buffer from kelondroCache (was never used because buggy; will now be replaced by new EcoBuffer) - added new data structure 'eco' for an index file that should use only 50% of write-IO compared to kelondroFlex The new eco index is not used yet, but already successfully tested with the collectionIndex The main purpose is to replace the kelondroFlex at every point when enough RAM is available. Othervise, the kelondroFlex stays as option in case of low memory (which then can even use a file-index) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4337 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a5054c038d	- added large number of generics - redesign of ordering structures in kelondro (old did not work with strict generics) - 50% IO reduction during read access on kelondroFlex (ommiting of read on index table) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	03e7782269	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4dc438f7e7	moved to Java 1.5: - changed build script to use java 1.5 compiler - first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings) - added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet) please set your development environment to use java 1.5! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	21b8d1b918	small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f243e338cf	implemented online caution also for local and remote search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4252 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b46bcaa5d8	changed method of profiling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4248 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	aefb3f7765	added memory graph picture to PerformanceMemory_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4241 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9b0ae4b989	added referrer to remote crawl url list git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4236 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	89b9b2b02a	redesigned remote crawl process: - instead of pushing urls to other peers, the urls are actively pulled by the peer that wants to do a remote crawl - the remote crawl push process had been removed - a process that adds urls from remote peers had been added - the server-side interface for providing 'limit'-urls exists since 0.55 and works with this version - the list-interface had been removed - servlets using the list-interface had been removed (this implementation did not properly manage double-check) - changes in configuration file to support new pull-process - fixed a bug in crawl balancer (status was not saved/closed properly) - the yacy/urls-protocol was extended to support different networks/clusters - many interface-adoptions to new stack counters git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4232 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	af10f729df	fixed image search and favicon loading git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4225 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c527969185	- enhanced monitoring of ranking parameters for details, please try http://localhost:8080/IndexControlRWIs_p.html - fixed computation of ranking ordering in some cases git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6eaa5a0e64	enhanced local search speed. The ranking process is now 6 times faster that before. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	425e4ead66	Allow absolute paths in configuration settings. - before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging). - abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path. - exceptions (hardcoded): DATA/LOG/yacy.logging DATA/SETTINGS/httpProxy.conf DATA/SETTINGS/user.db TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example. - add missing workPath to yacy.init (it was used in code, but there was no default in the file) - fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos. - replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
borg-0300	a5d28785b1	less OOM (works for me) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4194 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ccbfb15b6b	enhancement to crawl stacker enqueue order git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4192 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	55c87b3b12	changed behavior of crawl stacker - final flush only when tabletype = RAM - prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100 - number of maximun entries in stacker is configurable in yacy.init (stacker.slots) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a31b9097a4	preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls After removal of robots.txt checking from stacker threads, the multi-threading of this process is void. Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since creation of these threads is not resource-consuming, for a detailed explanation see svn 4106 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
fuchsi	508de558f7	sbStackCrawlThread is null during first cleanProfiles() run at startup. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4152 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
fuchsi	70614385ef	Attempt to fix the "lost profile handle" bug. It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4151 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	33fb2f756d	added emergency fail case in remote crawls in extreme situations this will cause that no remote crawls are send out any more this is bad, but it protects the case where failing remote crawls fill up the local queue too much, which is even worse git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4141 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
fuchsi	03c5b4ad68	more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org - RFC-822 date time had to include the time instead of date only - <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1 - <link> elements are mandatory for <channel> and <item> git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4131 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
fuchsi	7404f2c35c	Fix some of the issues with the RSS search interface, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=392 Note: the new DateFormatter822 in the plasmaSwitchboard is just a copy of the DateFormatter that always uses the US locale to allow formatting of a loocale independent date String. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4124 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	98abe0804d	another enhancement to crawl starts with link files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4123 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	01e0669264	re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	842308ea97	- redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched. next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3c74014004	automatic deletion of dead client connections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4110 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4275727d69	fix for peer ping problem (implemented a 3-time re-ping); cause for 'Connection reset' still unknown git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4095 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
fuchsi	5b0c1449e1	various fixes and cleanups for blacklist handling: 1. avoid adding duplicate file name entries in config properties for lists, 2. correctly merge all path masks from all list files for the same host masks, 3. rewrite helper methods standard java methods for Collection transformations, 4. merged various methods with identical functionality for different Collection implementations into one, 5. minor refactoring to improve code readability. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6c819a6fd9	added cache to favicon display added better synchronization for simultanous search requests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4076 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f81ef40cc4	no dht activity for small networks; this is not needed if the network is small git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4062 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a34d9b8609	* added a search history cache that maintains search results for 10 minutes it is necessary for the new search process that will do automatic re-searches a positive effect is, that when a re-search is done it can be monitored how many results had been contributed from other peers. The message for this contribution was moved from the end of the result page to the top. * enhanced re-search time when a global search was done an the local index has already a great number of results for this word * re-organised presearch computation; must be further enhanced git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb426565f0	added new yacy protocol for mass url-pull for better remote crawling distribution git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b5346141b3	made the plasmaHTCache static (there is only one internet, so we need only one cache) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	947fc46904	refactoring of search process: - re-designed remote request result processing - re-designed local result accumulation, will be further enhanced with snippet fetcher - removed search process handling in switchboad - made snippet class static (there is no need for multiple snippet objects) - removed some redundant tasks in server-side search process, should be a little bit faster now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4043 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3 4 5 ...

613 Commits (449e69743623cc73a694449890ed55d322bb9c70)