yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	3e742d1e34	Init remote crawler on demand If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread. Deploy of the remoteCrawlJob deferred on activation of the option.	10 years ago
Michael Peter Christen	69eacdf4eb	applying precompiled CommonPattern.COMMA.split to all places where split(",") was used	10 years ago
reger	24f68a4eb7	refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. default heuristicopensearch.conf: - openbdb.com removed - seems not longer to deliver results - config via solrconnector to datacite.org added (large technical library archive)	10 years ago
Michael Peter Christen	6491270b3a	large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing.	10 years ago
Michael Peter Christen	e64be5dcad	in case that the network is switched to any other than freeworld, RWIs are disabled. This is a temporary fix. There must be a better way to determine if RWIs are to be switched on or of.	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
orbiter	a55e77a115	added twitter search heuristic	12 years ago
cominch	ad62609ec7	added a possibility to define a custom network definition URL for remote management	12 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Roland 'Quix0r' Haeder	af5a597e47	Scroogle is not comming back, remove dead code Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
stbrumm	d18095dc48	Patch fuer Issue 0000102 and fixes to Patch (private peer status is a property of a peer, not a status)	13 years ago
Michael Christen	ce289eb64f	remove heuristics if index receive is switched off	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	4bea3f9714	hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes). The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b77b8cac0c	- enhanced html parser: recognized much more details in the content - added more properties to solr index - refactoring - more constants in switchboard - fix for some NPEs - recognition of more images - removed synchronization in HandleMap (obviously not necessary?) - added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	2861d0888a	) simplified code\n) fixed potential NumberFormatExceptions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	cb1f49d0f2	replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	a321c7673d	* adminAccountForLocalhost only for localhost * yacy crawls local domains also, if no password is set (the interface is already protected) * it's not required anymore, to set a password in intranet mode git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	4915d1781a	* use local backup-file, if remote network-definition is not availible * resolve single point of failure in networks, managed by central network-definitions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7363 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	e7552bd719	*) cleaning up the code a little bit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7343 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	670ba4d52b	- removed the remote crawl option from the network configuration submenu and - added a remote crawl menu item to the index create menu. This menu also shows a list of peers that provide remote crawl urls - set remote crawl option by default to off. This option may be important but it also confuses first-time users git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7158 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9d080f387e	change in handling of the all-visible home path for storage in YaCy: the home path can now be distinguished between - data home; the path where the DATA directory is created - application home; everything else This will make it possible to store application data on Mac releases within the ~/Library/YaCy directory; a place where Mac applications write their data. Similar techniques will be possible for debian and windows. To use the new data path, YaCy can be started with -start <data path> or -gui <data path> git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	945e0ba5a5	allow global search if res. observer disabled index transmission git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6658 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	308a973503	refactoring of tables data organisation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6644 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	24060885b6	- added Tables abstraction in data.Tables.java fix for http://forum.yacy-websuche.de/viewtopic.php?p=18910#p18910 http://forum.yacy-websuche.de/viewtopic.php?p=18894#p18894 http://forum.yacy-websuche.de/viewtopic.php?p=18814#p18814 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6631 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	8ce936bcdd	added an api recording function: it shall be possible to record all operations on YaCy in a database that should make it possible 1) to re-create a setting on fresh peers 2) to transmit a setting from one peer to another 3) to re-create crawl starts after a complete deletion of the index This functionality will also support 4) scheduled re-crawls (new implementation) To implement this, a new database structure has been crated that stores maps into blob heaps. to encode maps the b-encoding technique was used (this is the same encoding that torrent files use) - added a b-encoder - enhanced the b-decoder - added a b-encoded map heap data structure - added a table organisation based on b-encoded heaps - added a servlet to maintain such tables (see Tables_p.html) - integrated the servlet into the Advanced Settings menu - added an api recording based on the new tables git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6606 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	dd459281c8	applied code changes that are recommended by PMD git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	5841ee83d3	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6400 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ce8dc575ca	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	bea3b99aff	moved table and util classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
low012	5e4f267a36	*) added subversion properties and edited a few comments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6348 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1d8d51075c	refactoring: - removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here: http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages. - cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http. - because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5bb8074150	removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency. - The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well. - Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified. - Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed. - The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here. - Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	99bf0b8e41	refactoring of plasmaWordIndex: divided that class into three parts: - the peers object is now hosted by the plasmaSwitchboard - the crawler elements are now in a new class, crawler.CrawlerSwitchboard - the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d31e6f9c14	fix for http://forum.yacy-websuche.de/viewtopic.php?p=14457#p14457 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5899 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	14a1c33823	refactoring of wordIndex class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	94110df85a	moved logging partially to kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3f746be5d4	- consolidation and refactoring of many DHT target - computing methods - implemented vertical DHT acceptance ("my own DHT") to accept new targets - added new target computation for global search: addresses vertical targets also - enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries) - better performance value computations for PPM selection in network configuration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6e7d113eac	fix for wrong index initialization after network switch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5203 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	029e16b653	replaced some put(String, String) by putHTML(String, String) on serverObjects respond in htroot/ root didn't touch htroot/xml/ this should solve potential xss issues git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5184 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	536e77e8b7	modifications towards a single database operation to read/write http header and cached file at once: - removed distinction between header file types for http and ftp; ftp is simulated by using http properties - removed all old resourceInfo classes that handled this distinction - introduced a new distinction between http request and http response objects - unified new response objects with two other object types that had been introduced elsewhere - changed all servlet call methods to use the new http request header object type - divided static object keys for http header properties into request and response types - refactoring here and there (a large number of type changes and many methods merged/moved) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
danielr	17b7845eb5	* refactoring - moved constants from plasmaSwitchboard to own class (all 232 ;) - moved remoteProxy-Methods to httpRemoteProxyConfig, better names - removed some unnecessary code (else-statements) * formatting (correct indentation) * minor bugfixes (due to findbugs.sf.net) * hopefully fixed "missing quote" (announcing StringParts as UTF-8) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago

1 2

69 Commits (e8256bb3b1de7a943a81ddd987eeabf09b7fa315)