yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	10 years ago
Michael Peter Christen	69eacdf4eb	applying precompiled CommonPattern.COMMA.split to all places where split(",") was used	10 years ago
Michael Peter Christen	3d717b749a	fix for urlmaskfilter	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago
Michael Peter Christen	8c3e5b7b6d	added experimental pdf splitting which enables YaCy to split pdfs during parsing into individual pages and add them all using different URLs. These constructed urls are generated from the source url with an appended page=<pagenumber> attribute to the url get/post properties. This will distinguish the different page entries. The search result list will then replace the post parameter with a url anchor # mark which causes that the original url is presented in the search result. These URLs can be opened directly on the correct page using pdf.js which is now built-in into firefox. That means: if you find a search hit on page 5 and click on the search result, firefox will open the pdf viewer and shows page 5.	10 years ago
Michael Peter Christen	5516819354	preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again.	10 years ago
Michael Peter Christen	28683530cd	fixes to usage of no-cache: use and recognize also the no-store directive	10 years ago
reger	7d863d6254	fix empty text facet entry (noticed on Author facet)	10 years ago
Michael Peter Christen	0a879c98e7	added new 'firstSeen' database table and necessary data structures which hold a date for each URL to record when a url was first seen. This is then used to overwrite the modification date for urls upon recrawl in case that the first-seen date is before the latest document date. This behaviour is necessary due to the common behaviour of content management systems which attach always the current date to all documents. Using the firstSeen database it is possible to approximate a real first document creation date in case that the crawler starts frequently for the same domain. As a result the search results ordered by date have a much better quality and the usage of YaCy as search agent for latest news has a better quality.	10 years ago
orbiter	72c2bc5189	fix for search in case where local peer has no local seed address in portal mode	10 years ago
Michael Peter Christen	167c5a51f0	IPv6 fix	10 years ago
orbiter	fa2ad101ec	enhanced graphics computation (avoiding long string parsing for colours)	10 years ago
orbiter	ef813cec91	added proper copyright notice to OSM tiles presented at the search result page	10 years ago
Michael Peter Christen	f818f84adb	more ipv6 fixes	10 years ago
Michael Peter Christen	afd5bd5f5f	slightly enhanced Network table computation by using a lazy initialized bitfield for peer flags	10 years ago
Michael Peter Christen	2c2b50e65d	refactoring (class name should start with uppercase letter)	10 years ago
Michael Peter Christen	bc275dca07	added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer.	10 years ago
Michael Peter Christen	e8392e2ff2	fix for local search	10 years ago
Michael Peter Christen	0bfc69b29b	more ipv6 bugfixes	10 years ago
Michael Peter Christen	883622306e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/peers/Protocol.java	10 years ago
Michael Peter Christen	97995a1dd9	fix for remote search process	10 years ago
Michael Peter Christen	92c5d97486	fix for bad node flag setting with IPv6	10 years ago
orbiter	c27bad9326	more ipv6 fixes	10 years ago
Michael Peter Christen	460858fb22	more ipv6 fixes	10 years ago
Marc Nause	1e6e69bc40	Finished implementation of UPNP: ) will try other ports if YaCy standard ports are not available ) distinguish between internal and external port (not sure if this works 100%) Still to add: propery in config to enter own external port (in case of manually configured NAT)	10 years ago
Michael Peter Christen	e1bc768f9d	more IPv6 bugfixes	10 years ago
Michael Peter Christen	528f583d72	ipv6 fixes	10 years ago
Michael Peter Christen	247e626083	IPv6 host parsing bugfixes	10 years ago
Michael Peter Christen	fe917deb2d	when pinging other peers, be able to select the right IP option	10 years ago
Michael Peter Christen	65e6ae52fb	IPv6-enhanced Network monitoring page	10 years ago
Michael Peter Christen	6491270b3a	large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing.	10 years ago
Michael Peter Christen	ad35d9294f	added a 'stats' table which records some peer statistics twice every hour. The table can be shown with http://localhost:8090/Tables_p.html?table=stats The entries have the following meaning: aM: activeLastMonth aW: activeLastWeek aD: activeLastDay aH: activeLastHour cC: countConnected (Active Senior) cD: countDisconnected (Passive Senior) cP: countPotential (Junior) cR: count of the RWI entries cI: size of the index (number of documents) The entry keys are abbreviated to reduce the space in the table as the name is written again for every row. This is the beginning of a 'yacystats' micro-alternative als built-in function in YaCy. Graphics may follow after some time if enough test data is available.	10 years ago
Michael Peter Christen	f1032fb8fe	more enhancements to image search in case that a restriction to a single domain is done	10 years ago
Michael Peter Christen	475125f9d7	hack to get more results when doing a remote site search	10 years ago
Michael Peter Christen	81f9b34da7	increaesed ability ot search for all images on a single server within the p2p remote search	10 years ago
Michael Peter Christen	2c26013c50	better contentdom abstraction	10 years ago
reger	1fdcc2d67b	change seedfile upload ip check to allow intranet ip in intranet mode - this allows to setup a principal peer in intranet environment	10 years ago
reger	e31b0e6d67	- update javadoc Seed.getIP - default mySeed.ip to hostip in SeedDB.initMySeed() if Intranetmode this allows to become senior status in intranet hosted search network with view peers, otherwise peer would stay junior because of default init with loopback ip as public (dna) ip.	10 years ago
reger	350c6b8250	in IntranetMode allow intranet hosted seedlist with Network_Domain "any" - so far intranet seedlist hosts are always denied but need to be allowed in intranet mode	10 years ago
reger	3dde94422f	center searchevent lines on network graph (PerformanceSearch_p.html)	10 years ago
Michael Peter Christen	6344718f8b	reducing the concurrent query stack size and reduced concurrency of postprocessing to avoid OOM situations	10 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	10 years ago
reger	5f5fb4ecdc	remove unused static (RSS)search from protocol	10 years ago
reger	7c1706d83a	use CRLF in generated bat command scripts for windows - for easier viewing with standard viewers	10 years ago
orbiter	dab9a0786a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	51bf5c85b0	Renamed the transmission cloud to buffer in dispatcher since the name 'cloud' was a bad idea. Changed also the accumulation process for peer targets so that every dht chunk is not assigned the set of redundant targets but they are assigned to redundant targets individually. This enhances the granularity of the target accumulation and should enhance the efficiency of the process. Finally the dht protocol client was enriched with the ability to remove the 'accept remote index' flag from peers or remove peers completely if they do not answer at all.	11 years ago
reger	665e12f88e	move startup time from old serverCore to switchboard (most used here) to make servercore eventually obsolete.	11 years ago
Michael Peter Christen	e09218129c	remove check for local solr. This check was made during a time when Solr was optional and another alternative metadata store was available. Since that store is now removed, Solr is always available (internally or externally)	11 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
Michael Peter Christen	3dc5fb0050	fix for operator precedence bug (cast binds stronger than bitwise AND) in peer hash hashing. This should not change anything if java casts long to int by masking with 0xFFFFFFFFL but you never know. The important thing is, that the hashCode() should not return numbers that have the same order as the hash code order because hashing of seeds is used to remove the order in some places.	11 years ago
Michael Peter Christen	6634b5b737	debug code for index distribution testing	11 years ago
orbiter	7705e36703	fix for latest generic warning fix	11 years ago
orbiter	97983ba89f	fixed generics warnings for generic array instantiation that appeared after migration to Java 7	11 years ago
orbiter	88f4af90da	removed warnings	11 years ago
Michael Peter Christen	a1ac4c3b76	automatically clear graphics cache	11 years ago
Michael Peter Christen	4e734815e8	enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part.	11 years ago
orbiter	8e04030596	in case of short memory, do not cut down robinson peers to 1, just reduce by 50%	11 years ago
reger	c193a02023	defer creation of new ArrayList after possible early return (to skip not used object allocation)	11 years ago
reger	727dfb5875	refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex	11 years ago
reger	46016fa153	autoupdate fails to download latest release (1.71) due to default release blacklist - removed the default version blacklist regex from init (for future versions) !!! left existing update blacklist setting untouched !!! (existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html) - moved old blacklist patch to migration.java	11 years ago
orbiter	de95e5e524	reduced search activity corona strength in network image	11 years ago
reger	227c42bc96	eleminate obsolete URIMetaDataRow class by joining it with/into URIMetaDataNode.	11 years ago
Michael Peter Christen	5b83887da8	npe fix	11 years ago
reger	2953ebe701	fix: port in local target adress & button style	11 years ago
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	11 years ago
reger	a373fb717d	remove more unused from legacy server.http - triggerOnlineAction not used - useTemplateCache not used	11 years ago
reger	dd5bf0b71b	cleanup old reference to HTTPDemon.setAlternativeResolver optimize .yacyh check in AbstractRemoteHandler	11 years ago
orbiter	d68e5ad0c4	NPE fix for Thread name (just commited yesterday, sorry)	11 years ago
Michael Peter Christen	6ed9c0164e	attaching names to all Threads to get a better view in profiling tools like VisualVM	11 years ago
Michael Peter Christen	7640834b37	removed double concurrency to put Solr documents into the index. The writings to the solr index are also buffered in ConcurrentUpdateSolrConnector	11 years ago
Michael Peter Christen	1b5e3d523a	better control over close-state of remote solr connections	11 years ago
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	11 years ago
Michael Peter Christen	0dda979801	adopted network image drawing to increased number of peers	11 years ago
Michael Peter Christen	d9858e1b8a	removed warnings and superfluous logging	11 years ago
Michael Peter Christen	d2b8f2b477	enhancements for staticIP and ipv6 handling	11 years ago
orbiter	0002abd583	fix for OOM during remote search and too high load protection	11 years ago
sixcooler	5a917e13c6	use less ram on dht-URL transfer by not using a URIMetadataNode[]	11 years ago
sixcooler	4d77ca52c9	workaround to let dht-out run on smal Systems like a Pi	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	11 years ago
orbiter	fd4abc0565	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	d5b8e473c8	added load limit for DHT transfer: RWI acceptance only if local load is not too high	11 years ago
reger	2614fa7aeb	Skip remote Solr search if last try showed error As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is remembered in the seed.dna of the remote peer. This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status. Might also be set to "not available" on time-out of last try.	11 years ago
orbiter	a07e9b3582	concurrency-solid version of transmission limitation	11 years ago
orbiter	60ead31273	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	52bf7d1ac8	reduce load during dht transfer	11 years ago
Michael Peter Christen	0bf3cab8c7	- better 'extra'-peer selection - logging of health status for 'extra'-peer selection - concurrency for remote peer IO and interrupting the threads if time-out occurrs	11 years ago
Michael Peter Christen	ba44eb1160	when scaling the number of remote peers, also consider the machine load and the number of cores	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
Michael Peter Christen	47a82e471c	less blocking in SeedDB which caused deadlocks in peer ping	11 years ago
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	11 years ago
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	11 years ago
Michael Peter Christen	1c56befb93	fixed mess with test on localhost (which means local hosts for some cases)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	11 years ago
Michael Peter Christen	79771c60c0	IPv6 fixes	11 years ago
Michael Peter Christen	9a27bf6e82	removed filter computation in Protocol class for remote searches because that is already done in the QueryParams class	11 years ago

1 2 3 4 5 ...

383 Commits (4eeb448eb3d0b0fda80375aae866a5a6c914e30f)