yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	366ceae35a	Fixed missing transitive dependency to commons-collections4-4.1 Dependency required by poi-3.16. Dependency was not provided in YaCy but already defined on previous poi versions. This only became problematic since upgrade from poi-3.15 to poi-3.16 (commit `dedc6552d3`). Indeed in this new poi release, a poi component used in some YaCy parsers code paths now explicitely needs a class from the commons-collections4 library : org.apache.poi.hpsf.Section uses now org.apache.commons.collections4.bidimap.TreeBidiMap. Impacted YaCy parsers : xlsParser, pptParser, docParser. Issue detected by the folowing JUnit tests failing : ParserTest.testpptParsers(), ParserTest.testdocParsers(), xlsParserTest.testParse()	7 years ago
luccioman	bf72cbffa3	Updated debian package configuration to match new Java 1.8 target Following migration from Java 1.7 to Java 1.8 in commit `6fe735945d`	7 years ago
reger	119b65389d	upde to icu4j-59_1.jar	7 years ago
reger	4979439e87	Skip public post of jre version. Added to determine switch to java8 `596b5dfa59`	7 years ago
reger	e918ec199e	Replace deprecated ConcurrentHashSet with recommended Java8 ConcurrentHashMap.newKeySet() in postprocessDocuments()	7 years ago
reger	fb71994342	Harmonizing use of xml reader / sax parser in XMLBlacklistImporter eliminating the need for lib/xercesImpl.jar	7 years ago
reger	275d65fffe	Patch last_modified date with internal FirstSeenTime() if no date provided to make sure updated documents are indexed with their last-modified date as provided in current crawl. (to patch moddate always with firstseen might bear the risk of miss actual updates).	7 years ago
reger	d1b23afed6	Remove obsolete Protocol parameter ttl (time to live) not interpreted in target yacy/query.html also Protocol.querySeed() not used and parameter not interpreted in target servlet yacy/query.html	7 years ago
reger	dedc6552d3	upd to poi-3.16.jar	7 years ago
reger	15d78b1064	Replace deprecated getIP with getIPs in Protocol transferURL() and getProfile(). Remember used ip for error handling and departInterface	7 years ago
reger	ed36b47bec	Replace one more deprecated peerDeparture in Protocol.transferIndex() by moving/using interfaceDeparture() in transferRWI()	7 years ago
reger	37f44941fb	upd to pdfbox-2.0.7.jar	7 years ago
reger	41616de0b8	Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning. This overrides Solr default to use managed schema. As we don't use programatic schema changes this directs Solr to use schema.xml, eliminating the warning.	7 years ago
luccioman	0ee8c030c4	Log an error when Solr folder migration fails for some reason.	7 years ago
reger	44d455dfed	upd to jwat-warc-1.1.0.jar	7 years ago
reger	588c6e96fb	upd version for typeahead.jquery.js in jslicense.html	7 years ago
luccioman	5a646540cc	Support parsing gzip files from servers with redundant headers. Some web servers provide both 'Content-Encoding : "gzip"' and 'Content-Type : "application/x-gzip"' HTTP headers on their ".gz" files. This was annoying to fail on such resources which are not so uncommon, while non conforming (see RFC 7231 section 3.1.2.2 for "Content-Encoding" header specification https://tools.ietf.org/html/rfc7231#section-3.1.2.2)	7 years ago
luccioman	11a7f923d4	Distinguish response parsing failures from unexpected exceptions.	7 years ago
luccioman	8100c033a2	URL Viewer : apply crawler size limits when adding to local index. This allow large files parsing and preview, while preventing unwanted OutOfMemory errors which are likely to occur when adding to the Solr Index resources larger than configured crawler limits.	7 years ago
luccioman	eda7b0aeb6	Merge branch 'master' of https://github.com/yacy/yacy_search_server	7 years ago
reger	3005be7349	Clean up unmaintained and unused AugmentParser trail.	7 years ago
reger	e5cff062b5	Clean up redundant but obsolete jquery.rdfquery-core-1.0.js script lib	7 years ago
luccioman	cb4f1358e1	Added gzip parser support for max content bytes limit	7 years ago
luccioman	5216c681a9	Added HTML parser support for maximum content bytes parsing limit	7 years ago
luccioman	4aafebc014	Merge pull request #122 from Scarfmonster/patch-1 I also reproduced the issue, and the fix is working fine. Thanks @Scarfmonster	7 years ago
luccioman	651fad6da5	Added RSS parser support for maximum content bytes parsing limit	7 years ago
luccioman	452a17a8d5	Finer control on bounded input streams with custom stream implementation	7 years ago
luccioman	f8f1959ebb	Added parsing within bounds implementation to the generic parser.	7 years ago
luccioman	e0f400a0bd	Support trying multiple parsers even when streaming on large resources.	7 years ago
luccioman	1e84956721	Support loading local files with a per request specified maximum size. Consistently with the HTTP loader implementation.	7 years ago
luccioman	f369679d1c	Fixed read/copy on input streams reading sometimes less than expected.	7 years ago
reger	23bda133d2	Fix css conflict of YMarks.html to make it viewable. yacy-ymarks.css sidebar conflicts with bootstraps sidebar (different overlay settings). Simply renamed it to ymark-sidebar.	7 years ago
reger	af32d291c2	upd to commons-fileupload-1.3.3.jar	7 years ago
reger	a21789d4e7	Fix unresolved pattern in api/share.html by init some display var's	7 years ago
luccioman	bf55f1d6e5	Started support of partial parsing on large streamed resources. Thus enable getpageinfo_p API to return something in a reasonable amount of time on resources over MegaBytes size range. Support added first with the generic XML parser, for other formats regular crawler limits apply as usual.	7 years ago
luccioman	2a87b08cea	Removed temporary html parser test code	7 years ago
luccioman	1b3c169a9c	URL Viewer : decode raw text using the eventual response charset. When provided, or decode as UTF-8 as previously done.	7 years ago
luccioman	90a7c1affa	HTML parser : removed unnecessary remaining recursive processing Recursive processing was removed in commit `67beef657f`, but one remained for anchors content(likely omitted from refactoring). It is no more necessary : other links such as images embedded in anchors are currently correctly detected by the parser. More annoying : that remaining recursive processing could lead to almost endless processing when encountering some (invalid) HTML structures involving nested anchors, as detected and reported by lucipher on YaCy forum ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6005 ).	7 years ago
reger	e6e20dab52	upd to Jetty 9.4.6.v20170531 Modify loginservice to the changes in Jetty, partially based on pull request #101 https://github.com/yacy/yacy_search_server/pull/101 bu @automenta	7 years ago
luccioman	e4c730b99f	Updated PerformanceQueues_p.xml API with last related servlet changes	7 years ago
luccioman	dcc56318bb	Made remote search max system load limits configurable from UI. As reported by davide on YaCy forums ( http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the system is on high load, unless reading carefully YaCy configuration file, it could be difficult to understand why remote search results are not fetched.	7 years ago
reger	ddd13b776d	Add keyword constraint to rwi query result filter To discard rwi results not matching query keyword: parameter	7 years ago
luccioman	e82eaee4b6	Apply consistent behavior on HTTP resource size exceeding limit. On content size known from HTTP headers, terminates connection faster and improves error reports quality by reporting relevant message "Content to download exceed maximum value..." rather than previously "no response (NULL) for url...".	7 years ago
luccioman	0b75e92ac2	Do not wrap unnecessarily loader IOExceptions in IOExceptions	7 years ago
luccioman	433bdb7c0d	Respect maxFileSize limit also when streaming HTTP and when relevant. Constraint applied consistently with HTTP content full load in byte array.	7 years ago
luccioman	4b72b29ea2	Added an informative title on the crawl start robots.txt status icon	7 years ago
luccioman	d08f31c3a8	Crawl start Ajax request : properly handle eventual XML parsing errors Otherwise on a malformed getpageinfo_p XML response (from the browser point of view), JavaScript errors where thrown and the ajax status steering wheel remained displayed indefinitely.	7 years ago
luccioman	9b1bb2545e	Refactored plain-text URLs detection implementation. For faster processing (measured about 2 times faster on many real-world examples) and more advanced detection (previous algorithm detected only URLs separated from the rest of the text by a space character).	7 years ago
luccioman	8da3174867	Ensure lower case conversion consistency with any default locale. Especially for Turkish speaking users using "tr" as their system default locale : strings for technical stuff (URLs, tag names, constants...) must not be lower cased with the default locale, as 'I' doesn't becomes 'i' like in other locales such as "en", but becomes 'ı'.	7 years ago
luccioman	286f3018bd	Made mime type and extension normalization locale independent. Previously, upper cased mime type was incorrectly normalized when the default locale is Turkish.	7 years ago

1 2 3 4 5 ...

13334 Commits (86d41f024250e69bd21414d308969472cb1891a5) All Branches Search

13334 Commits (86d41f024250e69bd21414d308969472cb1891a5)

All Branches