yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	28b451a0b3	Made Cache compression level and lock timeout user configurable	8 years ago
luccioman	a7394b479b	Limit the synchronization blocking time on some Cache operations. Using a Reentrant lock instead of the intrinsic synchronization lock permits limiting the blocking time to acquire a lock. Useful on a very busy Cache concurrently accessed by many threads : when the time to acquire a lock is too high, getting/storing content on the cache becomes inefficient, and it is then better to fall back to loading remote resources. Illustrated by the CacheTest stress test and some traces reported in mantis 751 ( http://mantis.tokeek.de/view.php?id=751 )	8 years ago
luccioman	73ab4a7b3a	Prevent log pollution from unwanted Solr warnings. Many non-blocking "java.nio.file.NoSuchFileException" traces with warning log level can be logged by Solr, especially when heavily crawling. This is issue is known from Solr 5.x but still unresolved with Solr 6.x ( https://issues.apache.org/jira/browse/SOLR-9120 ) Consequently upgraded to "SEVERE" the default log level of the related internal Solr class. See also mantis 727 ( http://mantis.tokeek.de/view.php?id=727 )	8 years ago
Michael Peter Christen	c94a8c76bd	re-added solr synchronization hack	8 years ago
Michael Peter Christen	6fe735945d	migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8 Also: now Version 1.921	8 years ago
luccioman	ce89492319	Ensure system resource release by closing document stream.	8 years ago
luccioman	8399275142	Properly close file output streams even on exceptions scenarios.	8 years ago
luccioman	4e4dc6c4e5	Removed unnecessary finalize implementation. On such private classes with limited scope but with frequent instance creations and removals within the application lifecycle, implementing the finalize method is particularly unwanted as it decreases the garbage collector performance. What's more the Object.finalize() method is now deprecated in the JDK 9 and will eventually disappear from future releases (see https://bugs.openjdk.java.net/browse/JDK-8177970)	8 years ago
reger	632354e2ff	Tokenize result entry keywords and add some styling for display	8 years ago
reger	c42d17f607	upd to commons-compress-1.14.jar	8 years ago
luccioman	a04feac064	Ensure file input streams proper closing in both success and failures Also add when possible a warning level log message on input stream closing error instead of failing silently. This could help understanding some IO exceptions such as "too many files open".	8 years ago
luccioman	d98c04853d	Ensure proper closing of file input streams.	8 years ago
luccioman	c53c58fa85	Unsure closing ChunkIterator stream in every possible use case. Also trace in logs the eventual close failures instead of failing silently. This should help prevent holding too many unreleased system file handlers, as in the case reported by eros on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5988&sid=b00e7486c1bf7e48a0d63eb328ccca02 )	8 years ago
luccioman	29e52bda39	Merge branch 'master' of https://github.com/yacy/yacy_search_server	8 years ago
luccioman	a9cb083fa1	Improved consistency between loader openInputStream and load functions	8 years ago
reger	a814f3d885	Introduce keyword query parameter This enables keyword navigator to filter on keywords. Added search page output and layout config for keywords, allowing e.g. in Intranet use to display the keywords. No styling or links applied to the keyword text (but is desirable possibly in combination with bootstrap-tagsinput for future/intranet).	8 years ago
luccioman	cbccf97361	Added JavaDoc to the getpageinfo_p API servlet.	8 years ago
luccioman	c226ded799	Fix unescape of URLs having some '%' chars but not percent-encoded	8 years ago
luccioman	bd88fd303e	Deprecated duplicated and internally unused getpageinfo servlet. Redirections set for the transition of any eventual external uses: - /api/getpageinfo.xml to /api/getpageinfo_p.xml - /api/getpageinfo.json to /api/getpageinfo_p.json	8 years ago
luccioman	306a82dd71	Fixed scraper NullPointerException cases on malformed URLs.	8 years ago
luccioman	aa55d71cf5	Fixed a NullPointerException case on Digest authentication. Could occur when upgrading from a Debian package configured with Basic authentication (as in release 1.92.9000) to a more recent one with Digest authentication, without having re-encoded the admin password (for example with dpkg-reconfigure). As reported by eros on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5988#p33686).	8 years ago
reger	b65a04087b	upd to pdfbox-2.0.6.jar	8 years ago
luccioman	02ec0ed13c	Quoted param value in Solr query to avoid unwanted traces in logs When Webgraph Solr core is enabled, crawling and removing from index an URL whose hash starts with the '-' character (example URL : https://cs.wikipedia.org/ whose hash is "-2-HuTEndn4x") produced a full ParseException stack trace in YaCy logs. This was not blocking because the Solr query parser is able to escape itself the query and run it successfully, but filled uselessly YaCy logs.	8 years ago
luccioman	1be4d32f99	Restored search page default behavior for Tab, Page Up and Down keys Replaced by shortcuts defined by the HTML "accesskey" attribute which has the advantage to be advertised by screen readers when focusing the corresponding buttons, contrary to custom JavasScript key handlers. Now With Firefox : - "Alt + Shift + n" for next page - "Alt + Shift + p" for previous page Following ARIA recommendation : "keyboard shortcuts enhance, not replace, standard keyboard access." ( see https://www.w3.org/TR/wai-aria-practices/#kbd_shortcuts_behavior_design) Fix for mantis 711 (http://mantis.tokeek.de/view.php?id=711)	8 years ago
reger	1737af37cf	Set request originator to own peer in warc importer in addition to change in `039162fbf0`	8 years ago
reger	039162fbf0	Change warc importer to use defaultsurrogate-crawl profile, as reported by LA_FORGE http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5990 and analysed by @luccioman (see comment `510f11d374`) it creates conflict using a other crawlprofile without setting originator.	8 years ago
Michael Peter Christen	3b1d640a3c	enhanced debugging	8 years ago
Michael Peter Christen	7de7879f13	added a cache to prevent too many seed enumerations	8 years ago
luccioman	bd7411a53a	Enable p2p and cluster communication when "Protection of all pages" on As reported by paul89 on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5958 ), when setting the "Protection of all pages" to "On" in the "ConfigAccounts_p.html" page, the peer became completely unreachable by others, which is not the purpose of this feature. But the restriction still makes sense as a security enforcement and is maintained in private "Robinson mode" where by the way any peer-to-peer or cluster communication would be rejected.	8 years ago
luccioman	45346c1be8	Added missing accessibility attributes on search results progress bar.	8 years ago
luccioman	91a06bc669	Annotated search result information separators for screen readers.	8 years ago
luccioman	31ad043bb9	Added user interface feedback on results feeding termination status. Added as an additional icon with title in the search progress bar, to inform about background search feeder threads terminated or still running. While giving a bit more information to users about the p2p search process, this can help choosing whether or not wait a little bit more time before going to the next page, in order to get results from various sources sorted as best as possible (see #91 for a discussion about sorting accuracy and network latency). Other related modifications included : - regular updates to statistics in the progress bar until the background feeders are completely terminated. - removed some uses of unsecure and discouraged JavaScript elements	8 years ago
sgaebel	ff6392215e	added closing of lst-Tag in solr-Export	8 years ago
luccioman	d90b001e1b	Improved previous merge "Show ranking in HTML UI". - added the new setting as configurable in the "Debug/Analysis" settings page. Debug/analysis is its main purpose for now as there is currently no nice and "understansable" ranking score info servlet (see forum discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) - render in the "Search Page Layout" page preview when enabled - added constants	8 years ago
luccioman	efe1232d90	Merge branch 'html-show-ranking' of https://github.com/JeremyRand/yacy_search_server Conflicts: defaults/yacy.init	8 years ago
luccioman	0f0f42b509	Added some JavaDoc	8 years ago
reger	077d062be3	Adjust mergeDocuments to keep youngest last-modified date of document collection	8 years ago
luccioman	654801523e	Fixed StringIndexOutOfBoundsException case. Revealed by commit `c77e43a` : the exception was then thrown when indexing pages containing mailto: scheme URL links with the Solr Webgraph core enabled. Fixed the error case and restored filtering on mailto links in Document.resortLinks() as these URLs still should not appear in Document.hyperlinks.	8 years ago
luccioman	b297f5bdbe	Updated Debian package post install script admin password encoding. To fit the now default HTTP authentication method set to Digest in commit `f7fce1b`. Also fixed unauthenticated access from localhost setting when first installing the Debian package and letting the prompted password field empty.	8 years ago
luccioman	7623d7728f	Fixed Debian install message misspelling.	8 years ago
luccioman	522a268305	Improved new blacklist entries URL scheme detection.	8 years ago
luccioman	532981b363	Updated putHTML() JavaDoc	8 years ago
luccioman	58d23047dd	Handle '?' and '+' chars as valid wild cards when adding to blacklist. An entry such as "domain.com/[a-z]+" is a valid regular expression and do not need additional "../.*" wildcards.	8 years ago
luccioman	4564541b3b	Fixed blacklist Regex containing '+' characters rendering. As reported on YaCy forum by shni (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5970) when a blacklist entry contained both '?' and '+' characters, the '+' chars were wrongly decoded and rendered as spaces.	8 years ago
luccioman	0612a8f4f2	Fixed the previously added link to scheduled dump operations.	8 years ago
luccioman	a87281b498	Added MediaWiki dump import scheduling feature. Checking the last modified date by default to prevent unnecessary long running operations.	8 years ago
luccioman	10c03c6c64	Improved MediaWiki dump import monitoring. When import thread is terminated : - now stop refreshing and stay on the monitoring page to give user a feedback after a long running import - added link to the next monitoring step : results from surrogates reader - added link to new import On the new import page, added a link on the eventual last import report.	8 years ago
luccioman	edd7ccac40	Added some JavaDoc	8 years ago
luccioman	79fdf14b0a	Fixed regression introduced by commit `9ad4d16` On MediaWiki dump imports, the SurrogateReader was trying to unread too many bytes, then failing with the following exception : "java.io.IOException: Push back buffer is full".	8 years ago
Michael Peter Christen	7678fd67e3	copied fix from yacy_grid_parser for wrong array type	8 years ago

1 2 3 4 5 ...

13261 Commits (5216c681a9971b921aea8c041106df8b2102055b) All Branches Search

13261 Commits (5216c681a9971b921aea8c041106df8b2102055b)

All Branches