yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	7db0534d8a	Added a zim parser to the surrogate import option. You can now import zim files into YaCy by simply moving them to the DATA/SURROGATE/IN folder. They will be fetched and after parsing moved to DATA/SURROGATE/OUT. There are exceptions where the parser is not able to identify the original URL of the documents in the zim file. In that case the file is simply ignored. This commit also carries an important fix to the pdf parser and an increase of the maximum parsing speed to 60000 PPM which should make it possible to index up to 1000 files in one second.	1 year ago
Michael Peter Christen	d8f26cb6a7	larger link structure image	2 years ago
Michael Peter Christen	5acd98f4da	introduction of tag-to-indexing relation TagValency	2 years ago
luccioman	5a8d9abd8a	Upgraded d3js dependency from 3.4.4 to 5.7.0	6 years ago
luccioman	534f09e92b	Added and updated hint messages about remote crawler status To help identify why remote crawl results may not be received.	6 years ago
luccioman	cced94298a	Added a new crawler document filter type using Solr syntax This makes possbile to set up much more advanced document crawl filters, by filtering on one or more document indexed fields before inserting in the index.	7 years ago
luccioman	b154d3eb87	Added descriptive titles to Crawler_p.html speed settings. As reported by bubul (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5924) , LF and MH acronyms meaning were not detailed. Also added label tags for improved accessibility on these input fields.	8 years ago
luccioman	84b81c1af0	Switched more URLs to relative ones when possible. This permits an easier and more flexible reverse proxy configuration. Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
reger	3cc2af8f92	reduce the mix of absolute and relative internal html page links (prefer relative for same pg or neighbors) to ease proxied access e.g. http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	37df2e19fd	Removed xmlns attribute which no more makes sense in HTML5 pages.	8 years ago
luccioman	0065c9b9ea	Crawl monitoring : refresh running crawls table Fix mantis 690 ( http://mantis.tokeek.de/view.php?id=690 ). Tested on : - MS Windows 10 : Edge, Firefox 49, Chrome 53 - Debian Jessie : Firefox ESR 45	8 years ago
luccioman	e1e632ad84	Switched to the short HTML Doctype This page was already no more XHTML 1.0 as it makes use of the HTML5 <progress> element. Applied current HTML standard recommended Doctype declaration (see https://www.w3.org/TR/html/syntax.html#the-doctype ).	8 years ago
luccioman	4d8611e5e7	Tables accessibility : added missing <thead> sections.	8 years ago
luccioman	abe489a0b5	Removed unnecessary ARIA "form" role on native HTML form elements. This fixes warnings reported by W3C Nu Html Checker (https://validator.w3.org/nu/).	8 years ago
reger	579303a04e	add additional links to crawl queue pages	10 years ago
reger	5cb05c3013	adjust table column width to not line wrap crawler traffic line	10 years ago
reger	0260d3d800	Allow to hide linkstructure graphic in crawl monitor using/setting the config param DECORATION_GRAFICS_LINKSTRUCTURE	10 years ago
Ryszard Goń	3cdbd5f5c6	Fix for progress table background not resizing when the post-processing started/ended.	10 years ago
Ryszard Goń	3144313974	Postprocessing progress bar fix (Make it work as [probably] actually intended)	10 years ago
Michael Peter Christen	bbadccbd8d	better buttons	11 years ago
orbiter	469e0a62f1	added new button to terminate all crawls	11 years ago
Michael Peter Christen	8443255e18	better link structure limit calibration	11 years ago
Michael Peter Christen	a6bb9be97e	- added d3.js for visualizations using embedded svg - added a servlet api/linkstructure.json which generates a link graph information in json - added a javascript link graph renderer hypertree.js using d3 and the new servlet linkstructure.json - embedded the new link graph in the crawler monitor and the host browser	11 years ago
Michael Peter Christen	7a49f72480	fix for crawler column width	11 years ago
Michael Peter Christen	656e2ce62a	replacing direct html table cellspacing with css set-up for cellspacing	11 years ago
Michael Peter Christen	92655c7fd9	- added bootstrap css framework - adopted all YaCy administration pages to new framework - created new search page layout (working, but still work in progress) - old skin files are fully appliable! (and looking good) - target is a new style based on bootstrap examples, see /test.html - icons in YaCy may be replaced by glyphicons (to be done)	11 years ago
orbiter	e9abb25b03	tried javascript hack to make statistic divs equal height	11 years ago
Michael Peter Christen	1245cfeb43	small change to crawler monitor to fit in larger translations	11 years ago
orbiter	1960aafd6c	better height for statistic windows	11 years ago
orbiter	b0e3e2100d	better width for Progress table	11 years ago
malykhin.dmitry	29a7598991	update russian lang-file and small improve web-interface	11 years ago
reger	365f77ea8c	make internal page links relative to ease any future development for context aware servlets note also http://bugs.yacy.net/view.php?id=106	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	11 years ago
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	11 years ago
orbiter	9c681cc00d	added segment sizes, postprocessing status and cpu load to crawler monitor	11 years ago
orbiter	2c3b024196	if the crawl was paused (automatically), show the reason for pausing in the Crawler_p servlet.	12 years ago
Frank	7763f2554f	add the new PPMbar in Crawler_p for a better style and better use.	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
reger	7761b60325	fix: Broken Link on Crawler_p.html - issue 218 http://bugs.yacy.net/view.php?id=218 - reduced Solr logging (/select)	12 years ago
Michael Peter Christen	eca68fa197	added debug code to crawler monitor	12 years ago
Michael Peter Christen	71ed8e5e07	bugfixes for crawler	12 years ago
Michael Peter Christen	906e51214a	the web structure image shows the pivot dot in a different color	12 years ago
Michael Peter Christen	9eaede50e7	enhanced web structure images	12 years ago
Michael Peter Christen	ae6feb5610	showing the web structure graph as animation in the crawl monitor	12 years ago
Michael Peter Christen	a13e5153ac	- added the possibility to have not one but a list of crawl start urls - the list of urls is entered in the expert crawl start in a textfield; the one-line input field was replaced with a text box - start urls can also be given in one single line where the urls are separated by a '\|'-character - as an effect, the crawl profile cannot carry a single start url for identificaton because it is possible to have more. Therefore the url was removed from the crawl profile - this affect all servlets which display a crawl profile: removed the url field from all there servlets - to work consistently with several start urls and the other crawl starts which computed crawl start url lists from sitelists or sitemaps, the crawl start servlet was restructured completely - new rules for must-match patterns were created to make it possible that site crawl starts also work with several crawl starts at once	12 years ago
sixcooler	bea002dc15	correct table in new look of Crawler_p	13 years ago
Michael Peter Christen	638390930d	another patch to fix the Crawler_p layout	13 years ago
Michael Peter Christen	c846e9ca14	redesign of the crawler monitor page: show crawled pages instead of queue of urls that shall be crawled	13 years ago
Michael Peter Christen	16b21f7a5b	Added more steering in Crawler_p.html interface	13 years ago

1 2

56 Commits (66cf7d4ca5fa9626068a5bff3211f91761a1eb15)