yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Daleth Darko	3ced06c731	Various javadoc fixes	3 years ago
luccioman	fa4399d5d2	Small perf improvement : initialize threads names early when possible Initializing Thread names using the Thread constructor parameter is faster as it already sets a thread name even if no customized one is given, while an additional call to the Thread.setName() function internally do synchronized access, eventually runs access check on the security manager and performs a native call. Profiling a running YaCy server revealed that the total processing time spent on Thread.setName() for a typical p2p search was in the range of seconds.	7 years ago
luccioman	3e8dd90211	Use https rather than http in links and queries to openstreetmap.org	7 years ago
luccioman	1de86cf1bf	Fixed JPEG snapshot resizing when running on OpenJDK. Resizing JPEG snapshot images through /api/snapshot.jpg failed when running on OpenJDK, but rendered successfully with a Oracle JDK. Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ). Removing any alpha component (useless in snapshot images) from the rendered resized image solves the issue.	7 years ago
luccioman	fe75f326d8	Fixed ProfilingGraph calculation integer overflows and added test class. Complementary to fix proposed in PR #128 by @otteresk.	7 years ago
luccioman	e048e74072	Added an optional parameter to webstructure.xml api. This new "documentStructure" parameter can be set to false to only get hosts accumulated references on a resource and thus prevent scraping the specified URL and getting citations references. Also set WebStructureGraph constants as final and updated the Javadoc with example api call URLs.	8 years ago
luccioman	5c8958bcea	Updated Javadoc and Junit tests for the WebStructureGraph class.	8 years ago
luccioman	d9766ca981	Fixed WatchWebStructure_p.html render to include https URLs. As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721) WatchWebStructure_p.html failed to include in its structure view https and other protocols and ports than default http.	8 years ago
luccioman	ed3dd5e31a	Fixed webstructure.xml API used with a domain name 'about' parameter. As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720), when requesting this API with a domain name instead of a complete URL only HTTP references on default port were listed.	8 years ago
luccioman	0da1e6ba16	Factored code re-implementing DigestURL.hosthash() method. This ensure consistent implementation of the url host hash generation and easier usage finding in source code. Also added a unit test for this function.	8 years ago
luccioman	86adfef30f	Added automated unit tests and perfs test for WebStructureGraph class. Fixed references count when multiple links target the same domain name in one document.	8 years ago
luccioman	9cea7cbb10	Detailed some Javadoc related to /api/webstructure.xml usage.	8 years ago
reger	3c7220bc7b	Refacture rwi reference word position and word distance calculation used for rwi ranking. Main changes: - introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access) - use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null - adjust assignments and the min() max() and distance() calculation accordingly	8 years ago
luccioman	f0639d810c	Customized name for Threads still using the default "Thread-n" pattern. This makes threads monitoring easier to read.	8 years ago
luc	e40ae0943b	- No max dimensions specified : render raw image data when source and target image format are the same. - Corrected scaling condition.	9 years ago
luc	bc6c79fc12	Corrected scaling function for non RGB images.	9 years ago
luc	fc3294382e	Updated javadocs for warning on target encoding format potential errors.	9 years ago
luc	aa70ff4ff6	Corrected images alpha channel rendering	9 years ago
Michael Peter Christen	eec78e1b0c	added intensity option to graphics	10 years ago
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	10 years ago
Michael Peter Christen	5516819354	preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again.	10 years ago
orbiter	fa2ad101ec	enhanced graphics computation (avoiding long string parsing for colours)	10 years ago
orbiter	ef813cec91	added proper copyright notice to OSM tiles presented at the search result page	10 years ago
Michael Peter Christen	bc275dca07	added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer.	10 years ago
reger	3dde94422f	center searchevent lines on network graph (PerformanceSearch_p.html)	10 years ago
Michael Peter Christen	8c52f0651b	refactoring of AccessTracker events & timeline fix	11 years ago
Michael Peter Christen	74206a10c7	refactoring	11 years ago
Michael Peter Christen	a1ac4c3b76	automatically clear graphics cache	11 years ago
orbiter	de95e5e524	reduced search activity corona strength in network image	11 years ago
Michael Peter Christen	6ed9c0164e	attaching names to all Threads to get a better view in profiling tools like VisualVM	11 years ago
Michael Peter Christen	0dda979801	adopted network image drawing to increased number of peers	11 years ago
Michael Peter Christen	d9858e1b8a	removed warnings and superfluous logging	11 years ago
Michael Peter Christen	f1b5db2c45	- performance graph does not shop peer ping in memory monitor any more - after a forced GC, the PerformanceMemory view switches to automatic update by default	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Marc Nause	8fb1b1e290	*) simplified banner creation code	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
Michael Peter Christen	ed1d5bace6	draw the names of other peers which receive/send dht into the network graphic	12 years ago
Michael Peter Christen	b528448332	enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page.	12 years ago
Michael Peter Christen	bb4bf3d8fd	infinity timeout bug protection patch	12 years ago
Michael Peter Christen	fc2095ac67	some extensions to raster plotter to transform a RGB picture to an indexed color scheme. This is needed for gif animations	12 years ago
Michael Peter Christen	c1a2175fbc	added transparency to gif image animation and the integration to the YaCy httpd for on-the-fly generated gifs (including animated gifs)	12 years ago
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	12 years ago
orbiter	d74472f562	corrected result counter	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	12 years ago

1 2

92 Commits (4add1f6bc73469596b28a83fc13a9e1365f1e9d8)