yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	06d0e2aeb9	result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. - Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).	9 years ago
luc	37e28e0dd3	- Keep aspect ratio of images rendered directly by browser such as gif and svg. - Corrected quadratic rendering of landscape images with height smaller than maxHeight	9 years ago
luc	e2d00585e2	Display full size preview using ViewImage Servlet.	9 years ago
reger	5744342fec	handle image preview for url w empty file extension fix of commit `688f7b2a5c`	9 years ago
reger	688f7b2a5c	allow/display svg images in image results previews svg is not supported by awt but by most browser. Image content is delivered as received (without size adjustment)	9 years ago
reger	7c1da173e0	fix missing license in image search see http://mantis.tokeek.de/view.php?id=522	9 years ago
Michael Peter Christen	9c12555be5	added link to Snapshots in search results if the snapshot exists and option is set in ConfigSearchPage_p (this is a stub: we also need a visualization of pdf files!)	10 years ago
reger	000dde9511	Eleminate duplication of values for search ResultEntry by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode. - moved remaining ResultEntry functionallity to URIMetadataNode - for 1:1 functionallity added a function makeResultEntry() - removed ResultEntry - refactored related code Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated. Main difference left is, that	10 years ago
reger	3d53da8236	refactor ResultEntry to be based on MetadataNode/SolrDocument to share/reuse common access routines	10 years ago
reger	609c52e987	refactor getBookmark to consistenly check existance by != null (w/o throwing exception on not found)	10 years ago
reger	4c907bec89	show "Augmented Browsing" link in search result only if urlproxy allowed and option switched on in layout (AugmentedBrowsing_p.html, ConfigSearchPage_p.html) as user only gets a error page if the option is not enabled	10 years ago
Michael Peter Christen	fd4e2c809a	Show dates in the content of a document in the search result: - if an eventDate is given in the search result, replace the document date with the event date and prefix it with the string "on ". - the document date is omitted if a date from the cent is shown Added also the date as fields in the json and rss result sets.	10 years ago
reger	1196ff01c8	revert: formatting fix eats also up highlighting need other solution for snippets with unwanted html code	10 years ago
reger	61f42a7928	fix formatting issue in search result display if description contains html code noticed e.g. for id=NmNdJ9uApLaQ http://hswong3i.net/blog/hswong3i/virtualmin-drupal-7-x-ubuntu-12-04-howto	10 years ago
reger	11b21308c0	fix: malformed filename in image search fix for http://mantis.tokeek.de/view.php?id=533	10 years ago
reger	4eb89d7f15	revert clickservlet (default was indeed a mistakenly)	10 years ago
reger	ebe5faeb01	added url to bookmark icon link url is anyway needed, saves index lookup and works w/o commited url. Removed unused order parameter	10 years ago
reger	d44d8996d0	Added a “don't store remote search results” option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html	10 years ago
reger	0dfeee154a	adjustments for Bookmark icon to act on BookmarkDB, it acts on YMarks but YMark interface seems not maintained, for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB. The YMark related code is (for now) left untouched so both tables are updated.	10 years ago
Michael Peter Christen	ecb6a59e9e	do not translate gif images into png images for thumbnails. Instead, stream the original to the search result thumb viewer. This has two reasons: - animated gifs cause 100% cpu and deadlocks in the jvm gif parser; a known bug which is obviously not yet fixed - animated gifs now appear in the search result also as animation	10 years ago
reger	7e4e9f7e32	improve yacysearchitem, prevent allocation of String (modifyURL) if feature not used	10 years ago
Michael Peter Christen	5516819354	preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again.	10 years ago
Michael Peter Christen	3c71e1c872	show vocabularies in search result (in case of debugging)	10 years ago
Michael Peter Christen	b0bfafa581	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	1735dbc9d9	enhanced image search: bugfixes and performance enhancements	10 years ago
reger	1d5d0b82a6	- skip html template specific servlet post variables (show_xxx) for feeds, - add <updated> (in required format) to atom feed	10 years ago
reger	19e35a9126	add type attribute to atom feed <link> tag (for /yacysearch.atom)	10 years ago
Michael Peter Christen	c115f3869c	enhanced snippet computation and test method in ViewFile	10 years ago
reger	c798a9d1bb	fix unresolved pattern in yacysearch.rss title and rss xml error due to html & encoding in url entries	11 years ago
orbiter	ce1dbfeb0f	fix appearance of image search thumbnails.	11 years ago
Michael Peter Christen	f0db501630	better handling of ranking parameters and new default values for date navigation which is done using ranking in solr.	11 years ago
Michael Peter Christen	cbdfef7ce1	changed protocol facet to show also all other counts if one facet is selected	11 years ago
Michael Peter Christen	d1091e79f8	- added stealth button to navigation menu - more fixes to progress bar	11 years ago
reger	365f77ea8c	make internal page links relative to ease any future development for context aware servlets note also http://bugs.yacy.net/view.php?id=106	11 years ago
Michael Peter Christen	7e71dcc417	removed interaction fragments	11 years ago
reger	bd1685c94a	fix not needed getFileExtension().toLower (double) add missing .getFileExtension	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
reger	eaf596a257	adding proxy status to (private) status box (show also transparent and url proxy status) show search result via url proxy only if status=on	11 years ago
Michael Peter Christen	da380343c2	perform greedy learning heuristic only if load < 1.0	11 years ago
Michael Peter Christen	81926c055d	fixed bug with image search in yacyinteractive	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Michael Peter Christen	56cdcfa2fa	fixed greedy learning mode - global is not a search attribute in searchitems	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago

1 2 3 4 5

226 Commits (58824dfa6cdf9ff09c22ba01b7969f6e25bd4658)