yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	1737af37cf	Set request originator to own peer in warc importer in addition to change in `039162fbf0`	8 years ago
reger	039162fbf0	Change warc importer to use defaultsurrogate-crawl profile, as reported by LA_FORGE http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5990 and analysed by @luccioman (see comment `510f11d374`) it creates conflict using a other crawlprofile without setting originator.	8 years ago
Michael Peter Christen	3b1d640a3c	enhanced debugging	8 years ago
Michael Peter Christen	7de7879f13	added a cache to prevent too many seed enumerations	8 years ago
luccioman	bd7411a53a	Enable p2p and cluster communication when "Protection of all pages" on As reported by paul89 on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5958 ), when setting the "Protection of all pages" to "On" in the "ConfigAccounts_p.html" page, the peer became completely unreachable by others, which is not the purpose of this feature. But the restriction still makes sense as a security enforcement and is maintained in private "Robinson mode" where by the way any peer-to-peer or cluster communication would be rejected.	8 years ago
luccioman	45346c1be8	Added missing accessibility attributes on search results progress bar.	8 years ago
luccioman	91a06bc669	Annotated search result information separators for screen readers.	8 years ago
luccioman	31ad043bb9	Added user interface feedback on results feeding termination status. Added as an additional icon with title in the search progress bar, to inform about background search feeder threads terminated or still running. While giving a bit more information to users about the p2p search process, this can help choosing whether or not wait a little bit more time before going to the next page, in order to get results from various sources sorted as best as possible (see #91 for a discussion about sorting accuracy and network latency). Other related modifications included : - regular updates to statistics in the progress bar until the background feeders are completely terminated. - removed some uses of unsecure and discouraged JavaScript elements	8 years ago
sgaebel	ff6392215e	added closing of lst-Tag in solr-Export	8 years ago
luccioman	d90b001e1b	Improved previous merge "Show ranking in HTML UI". - added the new setting as configurable in the "Debug/Analysis" settings page. Debug/analysis is its main purpose for now as there is currently no nice and "understansable" ranking score info servlet (see forum discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) - render in the "Search Page Layout" page preview when enabled - added constants	8 years ago
luccioman	efe1232d90	Merge branch 'html-show-ranking' of https://github.com/JeremyRand/yacy_search_server Conflicts: defaults/yacy.init	8 years ago
luccioman	0f0f42b509	Added some JavaDoc	8 years ago
reger	077d062be3	Adjust mergeDocuments to keep youngest last-modified date of document collection	8 years ago
luccioman	654801523e	Fixed StringIndexOutOfBoundsException case. Revealed by commit `c77e43a` : the exception was then thrown when indexing pages containing mailto: scheme URL links with the Solr Webgraph core enabled. Fixed the error case and restored filtering on mailto links in Document.resortLinks() as these URLs still should not appear in Document.hyperlinks.	8 years ago
luccioman	b297f5bdbe	Updated Debian package post install script admin password encoding. To fit the now default HTTP authentication method set to Digest in commit `f7fce1b`. Also fixed unauthenticated access from localhost setting when first installing the Debian package and letting the prompted password field empty.	8 years ago
luccioman	7623d7728f	Fixed Debian install message misspelling.	8 years ago
luccioman	522a268305	Improved new blacklist entries URL scheme detection.	8 years ago
luccioman	532981b363	Updated putHTML() JavaDoc	8 years ago
luccioman	58d23047dd	Handle '?' and '+' chars as valid wild cards when adding to blacklist. An entry such as "domain.com/[a-z]+" is a valid regular expression and do not need additional "../.*" wildcards.	8 years ago
luccioman	4564541b3b	Fixed blacklist Regex containing '+' characters rendering. As reported on YaCy forum by shni (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5970) when a blacklist entry contained both '?' and '+' characters, the '+' chars were wrongly decoded and rendered as spaces.	8 years ago
luccioman	0612a8f4f2	Fixed the previously added link to scheduled dump operations.	8 years ago
luccioman	a87281b498	Added MediaWiki dump import scheduling feature. Checking the last modified date by default to prevent unnecessary long running operations.	8 years ago
luccioman	10c03c6c64	Improved MediaWiki dump import monitoring. When import thread is terminated : - now stop refreshing and stay on the monitoring page to give user a feedback after a long running import - added link to the next monitoring step : results from surrogates reader - added link to new import On the new import page, added a link on the eventual last import report.	8 years ago
luccioman	edd7ccac40	Added some JavaDoc	8 years ago
luccioman	79fdf14b0a	Fixed regression introduced by commit `9ad4d16` On MediaWiki dump imports, the SurrogateReader was trying to unread too many bytes, then failing with the following exception : "java.io.IOException: Push back buffer is full".	8 years ago
Michael Peter Christen	7678fd67e3	copied fix from yacy_grid_parser for wrong array type	8 years ago
Michael Peter Christen	200b100fb8	added patch to rewrite altered yacy grid schema into yacy schema This generates the stub and protocol parts of an url for inboundlinks, outboundlinks and images	8 years ago
reger	9ad4d16829	Add a responsHeader to the solr index export with a format identifier and export parameter (in accordance with response xml format) for easier format detection on import.	8 years ago
luccioman	9697209ef6	Fixed Index Export feature for compatibility with old indexed documents. This is a fix for mantis 682 (http://mantis.tokeek.de/view.php?id=682) and issue #116	8 years ago
luccioman	88c062639b	Added some JavaDoc	8 years ago
luccioman	8d288f5dba	Crawl results page : apply table lines number limit. Take into account the already existing default limit value (especially useful after a long crawl or surrogates import), or a custom one from parameter "count". Added a "Show all" link for convenience.	8 years ago
luccioman	31fff2c986	Extended WikiCode template inclusion syntax support. Wiki templates are not rendered but syntax support is improved, which greatly enhance snippets rendering on search results coming from a MediaWiki dump import. Tested on various dumps from Wikimedia at https://dumps.wikimedia.org/backup-index.html See also Wikipedia transclusion documentation at https://en.wikipedia.org/wiki/Wikipedia:Transclusion	8 years ago
Michael Peter Christen	973d74712f	added yacy grid flatjson surrogate parser	8 years ago
luccioman	b1da92648e	Fixed surrogates import monitoring page (/CrawlResults.html?process=7) This page was always empty, as described in mantis 740 (http://mantis.tokeek.de/view.php?id=740)	8 years ago
luccioman	527d494c1a	Fixed "Unchecked conversion" compilation warnings.	8 years ago
reger	2b03e40134	upd to jwat-1.0.5	8 years ago
reger	7a7da698d4	fix unit test MultiProtocolURL(file) assertion for Windows path with drive letter.	8 years ago
reger	c77e43a391	Take out mailto collect in internal parsed document As earlier plans to make use of mailto as separate webgraph entity didn't materialize (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5726&p=32493&hilit=mailto#p32493) free the unused handling and resources.	8 years ago
Michael Peter Christen	335868edba	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	8 years ago
reger	bec34d3546	Add url input field as source for WarcImporter allowing to import warc from url without prior download.	8 years ago
reger	d3df8a46c4	fix unresolved_pattern on missing post parameter api/message.html	8 years ago
luccioman	f66438442e	Extended Mediawiki dump import to remote URLs. When using a public HTTP URL in /IndexImportMediawiki_p.html, the remote file now is directly streamed and processed, allowing import of several GB dumps even with a low memory remote peer, and without need to manually download the dump file first.	8 years ago
luccioman	e5c3b16748	Improved http client close time on stream processing errors.	8 years ago
luccioman	23775e76e2	Fixed endless loop case in wikicode processing. Detected when importing recent MediaWiki dumps containing some pages with script content in plain text format (see Scribunto extension https://www.mediawiki.org/wiki/Extension:Scribunto ). Further improvement : modify the MediawikiImporter to prevent processing revisions whose <model> is not wikitext.	8 years ago
luccioman	0bc868a819	Improved support for non ASCII chars in local file system URLs Creating a MultiProtocolURL instance from a File object and then retrieving a File with getFSFile() was inconsistent with file paths containing space or non ASCII chars.	8 years ago
luccioman	7edddd7b0d	Improved error reports on various wiki dump prerequisites failure cases. Also added some JavaDoc.	8 years ago
luccioman	dfe8d4139b	Used a text input for wiki dump import file selection. Using an HTML "file" input was confusing (as reported by promocore on YaCy forum : http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5965) , and it only worked with MS IE/Edge on a local YaCy peer : - for security reasons some current major browsers such as Firefox or Chrome do not allow to send full file path information when using a file form input - the local file system selection popup doesn't make sense when you want to import a dump on a remote YaCy server	8 years ago
reger	3a71430030	Adjust ConfigSearchPage_p to activated hosts navigator as plugin	8 years ago
reger	7b80189bda	Activate hosts navigator plugin. This includes rwi results in the navigator count. This might be tangential related to http://mantis.tokeek.de/view.php?id=736 as the example includes a local index search, while rwi results are not counted.	8 years ago
reger	05a1b14b4a	add missing text from ConfigRobotsTxt_p to master.lng and link to Translation Editor to Translation News page.	8 years ago

1 2 3 4 5 ...

13187 Commits (1737af37cfdc2e078ea20f05ae9d8d8ffb9efbcb) All Branches Search

13187 Commits (1737af37cfdc2e078ea20f05ae9d8d8ffb9efbcb)

All Branches