yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	2a87b08cea	Removed temporary html parser test code	8 years ago
luccioman	1b3c169a9c	URL Viewer : decode raw text using the eventual response charset. When provided, or decode as UTF-8 as previously done.	8 years ago
luccioman	90a7c1affa	HTML parser : removed unnecessary remaining recursive processing Recursive processing was removed in commit `67beef657f`, but one remained for anchors content(likely omitted from refactoring). It is no more necessary : other links such as images embedded in anchors are currently correctly detected by the parser. More annoying : that remaining recursive processing could lead to almost endless processing when encountering some (invalid) HTML structures involving nested anchors, as detected and reported by lucipher on YaCy forum ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6005 ).	8 years ago
reger	e6e20dab52	upd to Jetty 9.4.6.v20170531 Modify loginservice to the changes in Jetty, partially based on pull request #101 https://github.com/yacy/yacy_search_server/pull/101 bu @automenta	8 years ago
luccioman	e4c730b99f	Updated PerformanceQueues_p.xml API with last related servlet changes	8 years ago
luccioman	dcc56318bb	Made remote search max system load limits configurable from UI. As reported by davide on YaCy forums ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the system is on high load, unless reading carefully YaCy configuration file, it could be difficult to understand why remote search results are not fetched.	8 years ago
reger	ddd13b776d	Add keyword constraint to rwi query result filter To discard rwi results not matching query keyword: parameter	8 years ago
luccioman	e82eaee4b6	Apply consistent behavior on HTTP resource size exceeding limit. On content size known from HTTP headers, terminates connection faster and improves error reports quality by reporting relevant message "Content to download exceed maximum value..." rather than previously "no response (NULL) for url...".	8 years ago
luccioman	0b75e92ac2	Do not wrap unnecessarily loader IOExceptions in IOExceptions	8 years ago
luccioman	433bdb7c0d	Respect maxFileSize limit also when streaming HTTP and when relevant. Constraint applied consistently with HTTP content full load in byte array.	8 years ago
luccioman	4b72b29ea2	Added an informative title on the crawl start robots.txt status icon	8 years ago
luccioman	d08f31c3a8	Crawl start Ajax request : properly handle eventual XML parsing errors Otherwise on a malformed getpageinfo_p XML response (from the browser point of view), JavaScript errors where thrown and the ajax status steering wheel remained displayed indefinitely.	8 years ago
luccioman	9b1bb2545e	Refactored plain-text URLs detection implementation. For faster processing (measured about 2 times faster on many real-world examples) and more advanced detection (previous algorithm detected only URLs separated from the rest of the text by a space character).	8 years ago
luccioman	8da3174867	Ensure lower case conversion consistency with any default locale. Especially for Turkish speaking users using "tr" as their system default locale : strings for technical stuff (URLs, tag names, constants...) must not be lower cased with the default locale, as 'I' doesn't becomes 'i' like in other locales such as "en", but becomes 'ı'.	8 years ago
luccioman	286f3018bd	Made mime type and extension normalization locale independent. Previously, upper cased mime type was incorrectly normalized when the default locale is Turkish.	8 years ago
luccioman	319231a458	Added a generic XML parser, able to parse elements text and URLs. This parser adds support for any XML based format other than already supported XML vocabularies such XHTML, RSS/Atom feeds... It will eventually be used as a fallback if one of these specific parsers fail, before falling back to the existing genericParser which extracts not that much useful information except URL tokens.	8 years ago
reger	aeeb8a7dd5	upd to jwat-warc-1.0.6.jar	8 years ago
reger	f0ba828627	remove unused Solr optional extra handler lib solr-dataimporthandler-6.6.0.jar	8 years ago
reger	1773b61b3e	upd to jsoup-1.10.3.jar	8 years ago
luccioman	64cec2790d	Improved character encoding detection from Content-Type header Also updated some related JavaDocs	8 years ago
luccioman	1acb7005d0	Added a basic JUnit test with test gz files for the gzip parser	8 years ago
luccioman	1e2fb76720	Properly close test files in htmlParser unit test	8 years ago
luccioman	c41b31dcb3	Cleaned up memory usage page HTML - fixed validation errors - removed deprecated attributes - improved accessibility with richer table semantics (headers and caption elements) and language declaration	8 years ago
luccioman	0487336ec3	Prevent integer overflow in table statistics and use strong typing	8 years ago
luccioman	0f80c978d6	Limit the number of initially previewed links in crawl start pages. This prevent rendering a big and inconvenient scrollbar on resources containing many links. If really needed, preview of all links is still available with a "Show all links" button. Doesn't affect the number of links used once the crawl is effectively started, as the list is then loaded again server-side.	8 years ago
luccioman	d2a4a27f52	Improved stream-oriented parsing entering conditions.	8 years ago
luccioman	32288a8999	Merge branch 'master' of https://github.com/yacy/yacy_search_server	8 years ago
luccioman	e9b4b29f90	Limit scope of some local JavaScript variables.	8 years ago
Michael Peter Christen	369b8e0e0b	added json(p) endpoint for crawl start	8 years ago
reger	83ba45ebae	make nsis build script require java 8	8 years ago
reger	cf70081cfc	update nsi installer java autodl bundleid to use jre-8u131	8 years ago
reger	9220ccbec7	remove reference to velocityresponsewriter in solrconfig.xml it is not longer part of solr-core api http://lucene.apache.org/solr/6_6_0/index.html	8 years ago
reger	4be4bfbba6	remove sample path setting in solrconfig.xml not valid in Yacy resulting in startup stop exception after fresh swithch to 1.921	8 years ago
reger	510859bcce	update maven pom setting to YaCy version 1.921 java 1.8 and solr 6.6	8 years ago
luccioman	f6e8d71718	Prevent high CPU load at startup, caused by the Solr suggester build. Reported by Collision on mantis 758 ( http://mantis.tokeek.de/view.php?id=758 ). Introduced by the new YaCy Solr configuration for Solr 6.6.0 (see commit `6fe735945d`), including now Suggester configuration.	8 years ago
luccioman	9dd790087d	Added HT Cache basic statistics (hit rate)	8 years ago
luccioman	5fdd5d16b1	Use volatile to ensure concurrent threads use up to date property value	8 years ago
luccioman	28b451a0b3	Made Cache compression level and lock timeout user configurable	8 years ago
luccioman	a7394b479b	Limit the synchronization blocking time on some Cache operations. Using a Reentrant lock instead of the intrinsic synchronization lock permits limiting the blocking time to acquire a lock. Useful on a very busy Cache concurrently accessed by many threads : when the time to acquire a lock is too high, getting/storing content on the cache becomes inefficient, and it is then better to fall back to loading remote resources. Illustrated by the CacheTest stress test and some traces reported in mantis 751 ( http://mantis.tokeek.de/view.php?id=751 )	8 years ago
luccioman	73ab4a7b3a	Prevent log pollution from unwanted Solr warnings. Many non-blocking "java.nio.file.NoSuchFileException" traces with warning log level can be logged by Solr, especially when heavily crawling. This is issue is known from Solr 5.x but still unresolved with Solr 6.x ( https://issues.apache.org/jira/browse/SOLR-9120 ) Consequently upgraded to "SEVERE" the default log level of the related internal Solr class. See also mantis 727 ( http://mantis.tokeek.de/view.php?id=727 )	8 years ago
Michael Peter Christen	c94a8c76bd	re-added solr synchronization hack	8 years ago
Michael Peter Christen	6fe735945d	migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8 Also: now Version 1.921	8 years ago
luccioman	ce89492319	Ensure system resource release by closing document stream.	8 years ago
luccioman	8399275142	Properly close file output streams even on exceptions scenarios.	8 years ago
luccioman	4e4dc6c4e5	Removed unnecessary finalize implementation. On such private classes with limited scope but with frequent instance creations and removals within the application lifecycle, implementing the finalize method is particularly unwanted as it decreases the garbage collector performance. What's more the Object.finalize() method is now deprecated in the JDK 9 and will eventually disappear from future releases (see https://bugs.openjdk.java.net/browse/JDK-8177970)	8 years ago
reger	632354e2ff	Tokenize result entry keywords and add some styling for display	8 years ago
reger	c42d17f607	upd to commons-compress-1.14.jar	8 years ago
luccioman	a04feac064	Ensure file input streams proper closing in both success and failures Also add when possible a warning level log message on input stream closing error instead of failing silently. This could help understanding some IO exceptions such as "too many files open".	8 years ago
luccioman	d98c04853d	Ensure proper closing of file input streams.	8 years ago
luccioman	c53c58fa85	Unsure closing ChunkIterator stream in every possible use case. Also trace in logs the eventual close failures instead of failing silently. This should help prevent holding too many unreleased system file handlers, as in the case reported by eros on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5988&sid=b00e7486c1bf7e48a0d63eb328ccca02 )	8 years ago

1 2 3 4 5 ...

13248 Commits (2a87b08cea67f8f2ae46e318c1c3945e8520ec53) All Branches Search

13248 Commits (2a87b08cea67f8f2ae46e318c1c3945e8520ec53)

All Branches