yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	eea2d71851	prevent creation of auth schema factories every time a servlet is called	4 years ago
Michael Peter Christen	fcc9386ed3	enhanced the (already fast!) png exporter	4 years ago
Michael Peter Christen	4e9b425f98	missing fix for latest commit	4 years ago
Michael Peter Christen	3213d9db37	updated jetty from 9.4.17 to 9.4.35 and fixed a bug in ServerSideIncludes that appeared only in that recent version of jetty	4 years ago
Michael Peter Christen	787fec0658	reduced complexity - removed concurrency in sort	4 years ago
Michael Peter Christen	cef5fde343	adding message to UI to make port change transparent	4 years ago
Michael Peter Christen	52228cb6be	added a gc to cleanup process (once every 10 minutes)	4 years ago
Michael Peter Christen	22841ffbf1	creating a threaddump during every cleanup process to be able to find out what a peer did (not) last time before a crash	4 years ago
Michael Peter Christen	36e616271b	do better documentation on how to set a default password	4 years ago
Michael Peter Christen	df2bf9ef28	try to fix maven build error	4 years ago
Michael Peter Christen	264bab6700	trying to fight the UI unavaiability this path addresses a possible issue with too many open connections to remote peers	4 years ago
Michael Peter Christen	7947baeb49	removed all remaining deprecation warnings	4 years ago
Michael Peter Christen	c0f6d6e11d	removed one deprecation warning for jetty library initializing ssl server port	4 years ago
Michael Peter Christen	133440a7a6	some debug lines	4 years ago
sgaebel	3431f91db9	removes unused 'unused' tokens	4 years ago
sgaebel	fc03c4b4fe	removes some warning and unused objects	4 years ago
sgaebel	4a495df63a	removes some deprecation-warnings	4 years ago
sgaebel	dd9d4b1188	replace org.junit.Assert.assertThat by org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid deprecation-warning	4 years ago
sgaebel	df9ea0a42a	removes some warnings: unused imports, params	4 years ago
sgaebel	9bc2297161	fixes deleting during recrawl	4 years ago
sgaebel	80785b785e	adds deleting during recrawl	4 years ago
Michael Peter Christen	e0ad8ca9da	replaced json library from JSON.org with libandroid-json-java This fixes https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	ea8df27e95	modified org.json.* library to fit into the YaCy environment as drop-in replacement. Also made some fixes and enhancements to the library.	5 years ago
Michael Peter Christen	60dc1241a3	added org.json.* library from https://android.googlesource.com/platform/libcore/+/refs/heads/master/json/src/main/java/org/json as a preparation step for https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	053e54a2c7	grand CORS for json files	5 years ago
Michael Christen	cfa27d2fd5	fixed links	5 years ago
Michael Christen	cb20aa7e54	removed donation message in search result column	5 years ago
Michael Christen	25227676ae	removed some warnings	5 years ago
luccioman	6b45cd5799	New optional crawl filter on the URL a doc must match to crawl its links For finer control over which parsed documents can trigger an addition of their links to the crawl stack, complementary to the existing crawl depth parameter.	6 years ago
luccioman	d16bc99835	Added "Show Metadata" links to the ViewFile.html links mode To conveniently follow parsed links in the file viewer	6 years ago
luccioman	a5771b1f14	Made SNI extension user configurable without the need for server restart TLS Server Name Indication (SNI) extension activation can now be configured with the new Settings_p.html?page=httpClient administration page. SNI extension is also now enabled by default, as in 2019 the unrecognized_name(112) alert is more properly handled by major web servers TLS implementations, following the RFC 6066 standard. Related YaCy issues : #153 #189 and #272 JDK 1.7 bug : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374 Apache httpd issue : https://bz.apache.org/bugzilla/show_bug.cgi?id=56241 RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3	6 years ago
luccioman	e90405b6f0	Support parsing audio URLs without file extension Added also a Junit for the audio tag parser	6 years ago
luccioman	a8316c79da	Allow JS resorting of search results by unauthenticated users Acces rate limitations to this search mode by unauthenticated users are set low by default to prevent unwanted server overload but can be customized through the SearchAccessRate_p.html configuration page Fixes #291	6 years ago
luccioman	0ab2b49c31	Made /yacysearch access rate limitations user configurable With a new admin page at /SearchAccessRate_p.html in menu Network Access > Local Search > Access Rate Limitations	6 years ago
luccioman	5b7e41202a	Added Solr GSA writer support for responses from remote instances	6 years ago
luccioman	4d8a948455	Properly close PDF snapshots loaded with pdfbox library	6 years ago
luccioman	74e6d6e984	Added Solr GrepHTML writer support for responses from remote instances	6 years ago
luccioman	5e6501974d	Added Solr snapshots writer support for responses from remote instances	6 years ago
luccioman	384c37102c	Improve accuracy of total results count on latest pages in Stealth mode Previously, when mixing results from local RWI and local Solr (Stealth mode), total local Solr count could be ignored on last result pages, when the page offset was higher than local Solr count but lower than total RWI count.	6 years ago
luccioman	5e9a08355a	Improved logging for federated search - Do not use spaces in logger identifier name so the log level can be configured in yacy.logging - Hold the logger instance to avoid the logging system to look for it from its name at each appended log message	6 years ago
luccioman	9782a98a9c	Added the possibility to customize facets sort type and direction Previously search navigators/facets elements were sorted only by counts. Now from the ConfigSearchPage_p.html admin page, sort direction (ascending/descending) and type (on counts or labels) can be customized independently for each navigator.	6 years ago
sgaebel	c2398fd890	remove warnings: 'Statement unnecessarily nested within else clause'	6 years ago
sgaebel	811d40a6c4	taking care of closing inputstreams, HTTPClient	6 years ago
sgaebel	8d2e7262d9	Recrawl: - set the chunksize to 100 to meet the max of the embedded solr - re-enable sorting (the case where we switched it of should be away) - enable recrawling on remote-solr	6 years ago
sgaebel	8f58c1dcfa	extend the SolrServlet to be usable as remote solr (incl. update) this feature needs to be enabled by uncomment the url-pattern	6 years ago
luccioman	7223a2fdb1	Removed usage of now deprecated Jetty function	6 years ago
luccioman	440d9f2fa0	Exclude peers with empty or disabled RWI from remote RWI search	6 years ago
luccioman	08ea0b0397	Added a configurable timeout to wkhtmltopdf calls for pdf snapshots Necessary to prevent blocking the indexing workflow when some wkhtmltopdf renderings fail without terminating	6 years ago
luccioman	3fb449b3b6	Properly resolve relative URLs against document URL in html base tags Fixes issue #256	6 years ago
luccioman	73a6e45524	Extended detection of external tools used for Snapshots generation This enable detecting wkhtmltopdf and Imagemagick convert executables when they are at system Path in addition to common installation paths.	6 years ago
luccioman	7dc1f60619	Fixed detection of absolute data folder path on MS Windows	6 years ago
luccioman	595e144797	Trace a message on incomplete proper server finish when killing process	6 years ago
luccioman	9daeea823b	Fixed concurrency issue on cache used for circles rendering Without synchronization lock, concurrent rendering of images including circles could lead to glitches as reported in issue #248	6 years ago
Michael Peter Christen	c347e7d3f8	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	6 years ago
Michael Peter Christen	848e9304d9	evil bots may crawl harder	6 years ago
luccioman	a997133260	Fixed gzip decompression regression on index transfer APIs Processing of gzip encoded incoming requests (on /yacy/transferRWI.html and /yacy/transferURL.html) was no more working since upgrade to Jetty 9.4.12 (see commit `51f4be1`). To prevent any conflicting behavior with Jetty internals, use now the GzipHandler provided by Jetty to decompress incoming gzip encoded requests rather than the previously used custom GZIPRequestWrapper. Fixes issue #249	6 years ago
luccioman	e85f231bdf	Fixed termination of Host browser and link structure Solr query threads On some conditions (especially when reaching timeout), concurrent Solr query tasks used by the /HostBrowser.html and /api/linkstructure.json never terminated, thus leaking resources, as reported by @Vort in issue #246	6 years ago
luccioman	fcf6b16db4	Added new crawler attribute for finer control over Media Type detection New "Media Type detection" section in the advanced crawl start page allow to choose between : - not loading URLs with unknown or unsupported file extension without checking the actual Media Type (relying Content-Type header for now). This was the old default behavior, faster, but not really accurate. - always cross check URL file extension against the actual Media Type. This lets properly parse URLs ending with an apparently odd file extension, but which have actually a supported Media Type such as text/html. Sample URLs with misleading file extensions added as documentation in the crawl start page. fixes issue #244	6 years ago
luccioman	a83a56473e	Added suport for PDF snapshots generation when running on MS Windows	6 years ago
luccioman	8852c97cee	Added basic styling for cleaner rendering of missing image snapshots For the output of the Solr snapshots writer	6 years ago
luccioman	746e0e788d	Render a relevant HTTP status code on snapshot image rendering error Instead of a null response body which is not very helpful.	6 years ago
luccioman	50b6edfcf5	Updated Solr snapshots writer for a cleaner html head	6 years ago
luccioman	f366f43d6b	Made snapshots size customizable in Solr snapshots response writer	6 years ago
luccioman	7a62fc0e66	Fixed concurrency issue in custom classloader used for template classes As reported in issue #241, the problem is only critical (random but complete crash of the JVM) when upgrading to JDK11.	6 years ago
luccioman	61c337f29a	Decode blacklist entries for easier edition of non ascii chars Not using the JDK URLDecoder.decode() function, as it strips '+' characters when they occur after '?' (both characters having regular expression semantics when used in blacklist path patterns)	6 years ago
luccioman	ed93221fa1	Improved normalization of blacklist path patterns having non ascii chars Normalize blacklist path patterns using percent-encoding, at pattern edition in web interface and at loading from configuration files. Fixes issue #237	6 years ago
luccioman	2a73b63d9e	Use a constant default target file name for seed SCP upload method To make seed upload (in /Settings_p.html?page=seed page) with SCP easier when the user specify a remote target directory path. See report by @vikulin in issue #227	6 years ago
luccioman	b5eabb626f	Removed some dead code	6 years ago
luccioman	db7ad76366	Improved support for Java logs file pattern options - support of "%h" and "%t" pattern components - more proper initialization of file handler when the data folder is not the default one, notably to prevent a non blocking but ugly error stack trace reported by the log manager at startup with that kind of setup	6 years ago
luccioman	7adbd1f87d	Fixed raw IPV6 addresses snapshots read/write on FAT32 and NTFS fs Fixes issue #225	6 years ago
luccioman	9b1c87033b	Fixed logs folder checking and creation Previously, if YaCy log folder was for example at `/home/user/yacy/DATA/LOG`, because of improper truncation of log path, an unnecessary directory creation was atempted at `/home/us`.	6 years ago
luccioman	c29588dd6a	Made possible to provide an absolute data root path for start script Previously, only a path relative to the user home folder could be provided	6 years ago
luccioman	d03c098b54	Removed deprecated warning comments about imports and Debian installer Deprecated by commit `be5d3a1066` , as classpath is now defined in yacycore.jar Manifest file.	6 years ago
luccioman	5b60b4225f	Fixed encoding of '+' character on search pages links As revealed by issue #216	6 years ago
luccioman	54fbe166ba	Updated pdf cache clear steps consistently with current pdfbox version - Removed calls to no more existing clearResources functions (on PDFont class and its children) since upgrade to pdfbox 2.n.n - Removed hacky usage of protected internal ClassLoader function. This removes the warnings displayed when running with JDK9 or JDK10 : [java] WARNING: Illegal reflective access by net.yacy.document.parser.pdfParser$ResourceCleaner (file:<path>) to method java.lang.ClassLoader.findLoadedClass(java.lang.String) [java] WARNING: Please consider reporting this to the maintainers of net.yacy.document.parser.pdfParser$ResourceCleaner [java] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations [java] WARNING: All illegal access operations will be denied in a future release Crawling thousands of pdf documents from various sources after modifications applied, revealed no new memory leak related to pdfbox (measurements done with JVisualVM).	6 years ago
luccioman	685122363d	Added a parser for XZ compressed archives. As suggested by LA_FORGE on mantis 781 (http://mantis.tokeek.de/view.php?id=781)	6 years ago
luccioman	4ee14ff3c5	Fixed NullPointerException case on malformed crawl queue folder name	6 years ago
luccioman	21ad9435ec	Fixed crawl queue folder naming for IPv6 hosts on MS Windows filesystems As reported by @vikulin in issue #187, crawling websites using a raw IPv6 address as host name in their URL failed when running on Microsoft Windows platforms (FAT32 or NTFS filesystems) when YaCy crawler created the crawl queue folder, as the ':' character which is part of an IPV6 address is forbidden on these filesystems.	6 years ago
luccioman	8a29551c54	Upgraded the OpenGeoDB dump URL The status of the library in the DictionaryLoader_p.html page now also advertises the user that an upgrade can be applied when an older dump is already loaded. Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter chat.	6 years ago
luccioman	373edf9eac	Adjusted yjson Solr writer to support responses from an external Solr Worked previously only with responses from YaCy embedded Solr, now able to render the response when YaCy is configured to use an external Solr index.	6 years ago
luccioman	87bd17b1cf	Simplified a little bit the RSS OpenSearch Solr writer	6 years ago
luccioman	dc49ca9c27	Fixed a NPE case on the Solr OpenSearch response writer Occurred when omitHeader parameter is set to true	6 years ago
luccioman	f4267ed247	Made Solr OpenSearch RSS writer compatible with external Solr index Worked previously only with responses from YaCy embedded Solr, now able to render the response when YaCy is configured to use an external Solr index.	6 years ago
luccioman	b1410f593a	Fixed stylesheet relative URLs rendering in Solr html writer Relative URLs to CSS stylesheets were not properly rendered when using the Solr html response writer and the "/solr/collection1/select" entry point instead of "/solr/select".	6 years ago
luccioman	89c59814da	Improved rendering of the Solr api relative url in the html writer In order to have a consistent relative url when using either /solr/select or /solr/collection1/select entry point.	6 years ago
luccioman	bf4f320b16	Optionally render the response header when using the Solr html writer With params rendered as html input fields for conveniently modifying params values and refreshing results.	6 years ago
luccioman	313204ae2c	Override qf and df Solr params with defaults only when they are not set	6 years ago
luccioman	bdafb14336	Removed redundant synchronization lock on network switch function Was useless as done in an already synchronized block, and the lock object was assigned a new value in that same block, and nowhere else a lock is requested on that same object.	6 years ago
luccioman	d5f44ea216	Removed unnecessary synchronization lock from serverSwitch constructor Lock was useless here as it was set on an object instance attribute while the object itself is not yet constructed and no other threads can access it.	6 years ago
luccioman	dcad393fe5	Fixed exceeding max size of failreason_s Solr field on large link list When using the 'From Link-List of URL' as a crawl start, with lists in the order of one or more thousands of links, the failreason_s Solr field maximum size (32kb) was exceeded by the string representation of the URL must-match filter when a crawl URL was rejected because not matching.	6 years ago
luccioman	f467601561	Properly lock solrInstances for reboot and restoration of embedded Solr Putting a synchronization lock directly on the solrInstances property was ineffective as it is assigned a new (unlocked) instance in these operations.	6 years ago
luccioman	9630f81306	Fixed small unnecessary lines of code	6 years ago
luccioman	876bcd2f54	Fixed useless comparison between int parameter and Long.MAX_VALUE	6 years ago
luccioman	c726154a59	Fixed removal of URLs from the delegatedURL remote crawl stack URLs were removed from the stack using their hash as a bytes array, whereas the hash is stored in the stack as String instance.	6 years ago
luccioman	2bdd71de60	Added server side columns sorting on the Process Scheduler table For easier usage of large tables in the Table_API_p.html page.	6 years ago
luccioman	bb51555830	Removed remaining unsafe accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	f895745e1c	Removed more unsafe concurrent accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	e97580dfc7	Fixed unsafe conccurent access to generic SimpleDateFormat instances SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	8811700e2e	Upgraded Jetty dependency from 9.4.9 to 9.4.11	6 years ago
luccioman	d53c33e4ef	Fixed potential infinite loop case (does not occur in current code base)	6 years ago
luccioman	a15ac8e0ca	Made CrawlProfile loading tolerant to malformed json string attribute	7 years ago
luccioman	a715bb7876	Fixed rendering of solr mustNoMatch value on CrawlProfileEditor_p.xml	7 years ago
luccioman	0b302c5004	Do not block whole server startup on persisted crawl profile load error	7 years ago
luccioman	4d9aa4ed1e	Fixed default crawl profile solr mustnotmatch query from previous commit	7 years ago
luccioman	cced94298a	Added a new crawler document filter type using Solr syntax This makes possbile to set up much more advanced document crawl filters, by filtering on one or more document indexed fields before inserting in the index.	7 years ago
Michael Christen	e0dc632020	removed transformer it was not used any more	7 years ago
luccioman	9bc7b6c39d	Allow edtion of scheduled next execution dates for finer control Can be useful more especially when scheduling many API calls over a long period of time to precisely adjust each scheduled date/time.	7 years ago
luccioman	40e8c7b89b	Use the heavy ConcurrentUpdateSolrClient only when necessary Prefer the lightweight HttpSolrClient when no updates are performed on the remote Solr instance, as recommended by Solr documentation itself.	7 years ago
luccioman	bd4cfeda3f	Add a max acceptable limit to the size of Solr responses on p2p search Following activation of gzip compression on responses, to ensure uncompressed content can fit on available memory.	7 years ago
luccioman	de4ea95687	Consistently allow gzip compression of remote Solr responses Was already enabled when requesting remote Solr with https or with authentication (as an external Solr index)	7 years ago
luccioman	cea8187161	Reuse expired connections evictors threads provided by apache and solr	7 years ago
luccioman	b5dc1f376f	Made outgoing pools max total connections user configurable For a finer control over the maximum simultaneously active outgoing connections.	7 years ago
luccioman	387d646c0e	Added gzip compression of responses returned to user-agents accepting it Enabled as default, but can be disabled using the "Server Access Settings" admin page.	7 years ago
luccioman	a7a4ba3287	Apply remote solr configured timeout on getting connection from pool	7 years ago
luccioman	ee6670fb8f	Use a common pooled http connection manager for remote solr instances For a better control on the maximum simultaneous outgoing http connections, as already done for any other http connections (crawls, rwi search, p2p protocol) using the net.yacy.cora.protocol.http.HTTPClient	7 years ago
luccioman	d28f9ba0f6	Removed use of deprecated ConcurrentUpdateSolrClient constructor	7 years ago
luccioman	8a749aa5ad	Trace level log message for monitoring remote solr response times	7 years ago
luccioman	35826a3091	Added a search page customization setting to display or not favicons If not interested in displaying this on your search results and notably on a peer with limited resources this can help saving some CPU and outgoing network connections.	7 years ago
luccioman	0082b5ab2a	Added missing default Solr http client connection timeout initialization Consistently with the custom Solr http client used for https connections to remote Solr peers or to YaCy external Solr storage. This prevent remote Solr requests threads to wait for establishing a connection to a remote peer longer than the configured timeout.	7 years ago
luccioman	fa4399d5d2	Small perf improvement : initialize threads names early when possible Initializing Thread names using the Thread constructor parameter is faster as it already sets a thread name even if no customized one is given, while an additional call to the Thread.setName() function internally do synchronized access, eventually runs access check on the security manager and performs a native call. Profiling a running YaCy server revealed that the total processing time spent on Thread.setName() for a typical p2p search was in the range of seconds.	7 years ago
luccioman	84d82bfdd7	Adjusted suggestions timeout management * less CPU usage using the Solr 'allowedTime' parameter * increase chances to get some results even when a first operation step goes in time out by letting some time for final snippets results processing	7 years ago
luccioman	65854bcb22	Fixed NullPointerException when omitHeader=true on external Solr server	7 years ago
luccioman	c4d984cec8	Fixed Solr response header duplication when requesting external Solr	7 years ago
luccioman	124cc24aa3	Properly handle embedded Solr partial results Solr can provide partial results for example when a processing time limit (specified with the parameter `timeAllowed`) is exceeded. Before this fix, getting partial results from an embedded Solr index resulted in a ClassCastException : "org.apache.solr.common.SolrDocumentList cannot be cast to org.apache.solr.response.ResultContext".	7 years ago
luccioman	3ce44cf250	Fixed largest snippet get : don't reject ones starting with a space char	7 years ago
luccioman	f511e16d50	Prevent duplication of Solr query highlight fields parameters That was caused by concurrent modifications (with addHighlightField() function) to the same SolrQuery instance when requesting Solr on remote peers in p2p search.	7 years ago
luccioman	e357ade47d	Reduced memory footprint of text snippet extraction By not parsing and storing at first all sentences of a document, but only on the fly the ones necessary to compute the snippet.	7 years ago
luccioman	e115e57cc7	Reduced text snippet extraction processing time. By not generating MD5 hashes on all words of indexed texts, processing time is reduced by 30 to 50% on indexed documents with more than 1Mbytes of plain text.	7 years ago
sgaebel	4b79851e12	corrected icons_sizes_sxt to SolrType.string	7 years ago
luccioman	3b89c232db	Easier tracking of longest text snippets initializations When text snippets statistics are enabled and FINE log level is enabled on the TextSnippetStatistics class.	7 years ago
luccioman	3c4344cb12	Fixed text snippet max init time statistic rendering	7 years ago
reger	a8234b7ea7	Make sure for image resource url enabled index image pixel size fields are filled if at least one of the image size fields is enabled in index (images_height_val, images_width_val, images_pixel_val). Previously all fields were required to be enabled (hint: default setting is height + width enabled)	7 years ago
luccioman	e67df103b5	Removed more remaining uses of deprecated Seed.getIP() function.	7 years ago
luccioman	addd18c993	Removed some remaining uses of deprecated Seed.getIP()	7 years ago
luccioman	c35d0568b6	Support for preferred https in peers communication on more operations	7 years ago
luccioman	e914d17aca	Updated call to function deprecated since commons-codec version 1.11	7 years ago
luccioman	a3ec7a7a5f	Added analysis optional setting to compute statistics on text snippets Thus producing some basic stats on processing times for snippets generation and counts on snippets per source type.	7 years ago
luccioman	1889d484de	Added Solr HTML writer support for responses from remote instances	7 years ago
luccioman	2af3bf79c7	Improve rendering of remote Solr admin URLs - properly handle IPv6 loopback address replacement - replace loopback address or host only when accessing peer remotely - replace loopback part with the peer hostname as requested rather than with its seed public IP as this works better for Intranet mode and when peer is behind a reverse proxy.	7 years ago
luccioman	bb74de7d59	Removed unnecessary "/admin" suffix from remote Solr instance admin URL For quite quite a long time now, the Solr /admin URL suffix indeed redirects to the Solr base context (see https://issues.apache.org/jira/browse/SOLR-3337)	7 years ago
luccioman	0d34034f17	Ensure an embedded Solr is available for Solr dump/restore operations Otherwise, these operations triggered NullPointerException when only an external Solr index is attached.	7 years ago
luccioman	d92b191942	Ensure no remote Solr is attached before "Shut Down and Re-Start Solr" Otherwise once this operation is applied, the remote Solr(s) instances are deconnected and the embedded Solr is connected even if disabled by setting "core.service.fulltext". Also use constants for related default setting values.	7 years ago
luccioman	26d8ad591c	Adjusted Solr select servlet output when using an external Solr only - Use the EnhancedXMLResponseWriter only when requested output is "exml" - Use the Standard Solr writers when possible, for example for json, xml or javabin output formats - Return an error when the requested format can not been rendered with an external Solr server only Important : this modification is necessary for peers using exclusively an external Solr server to be reachable as robinson targets in p2p search, as the binary format ("javabin") is the default Solr exchange format for peers. Before this, when a peer requested a remote one attached only to an external Solr (no embedded one), it ended with "Invalid type" error, as the remote peer answered with xml although binary format was requested.	7 years ago
luccioman	69690c13a0	Optionally allow external Solr server with self-signed certificate This is necessary when you want to attach to a dedicated external Solr server protected with basic http authentication and requested over https but having only a self-signed certificate.	7 years ago
luccioman	b882f85900	Fixed NPE case in Solr select servlet on external Solr only setup Regression introduced with commit `0d7625ecfb`	7 years ago
luccioman	2fd4d05e2f	Added a shared Java constant for setting key server.servlets.called	7 years ago
luccioman	ba9cd14516	Removed hard-coded patch for Solr 5.0 on ranking boost function The current default boost function (`recip(ms(NOW,last_modified),3.16e-11,1,1)`) for the Date ranking profile is indeed working fine. What can trigger the error `unexpected docvalues type NUMERIC for field 'last_modified'` is the previous default boost function (quite old now) or any custom one using the Solr `ord` or `rord` functions on the last_modified field. Then the problem was that the migration code in the Switchboard supposed to detect the old date boost function was incorrect (one trailing right parenthesis in excess), so the deprecated function remained. This fixes issue #169.	7 years ago
luccioman	fb3032c530	Added a crawl filtering possibility on documents Media Type (MIME)	7 years ago
luccioman	e45afedee4	Added support for enclosures (media links) to the RSS loader	7 years ago
luccioman	aaefd5219c	Reduce log verbosity of RSS loader on feed items with no link	7 years ago
luccioman	cf62b571bd	Added RSS reader support for `enclosure` feed item sub element. Enclosure element (see http://www.rssboard.org/rss-specification#ltenclosuregtSubelementOfLtitemgt ) can be seen for example in podcasts feeds.	7 years ago
luccioman	e5f5de0fc7	Added some JavaDoc to the RSSMessage class.	7 years ago
luccioman	0d7625ecfb	Handle Solr fields restrict and alias in YaCy html and exml writers Thus allowing for example to read more easily the local Solr index full metadata in HTML by restricting if desired to some fields of interest. See Solr documentation about the 'fl' (Field List) parameter at https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefl_FieldList_Parameter	7 years ago
luccioman	3da2739bbd	Parse and index more common audio metadata text tag fields.	7 years ago
luccioman	846aba00fa	Added parsing of URLs eventually present in audio metadata tags	7 years ago
Michael Peter Christen	187075b878	added nav filter	7 years ago
luccioman	bcbd0ae1a4	Enabled partial parsing of audio resources.	7 years ago
luccioman	fda0189613	Updated audio file extensions with ones recently added to audioTagParser	7 years ago
luccioman	978e2be95b	Let a chance for other parsers on audioTagParser error As done in all other parsers, eventually falling back in the end to the genericParser which creates a minimal index entry.	7 years ago
luccioman	9e5846a26e	Small fix on svg parser error message	7 years ago
luccioman	11611dbdcf	Reuse existing File copy function to handle audio parser tmp files	7 years ago
luccioman	f77f8f40f9	Factored audio parser tag processing	7 years ago
luccioman	9a7a353d0e	Removed some unnecessary intermediate list creation on array copy.	7 years ago
luccioman	fb6457f5bc	Fixed NPE case when on audio resource parsed with null tag	7 years ago
luccioman	c3ff50c17a	Updated the list of audio file formats supported by the audioTagParser Follows upgrade to Jaudiotagger dependency to version 2.2.5.	7 years ago
luccioman	1b90479a76	Added missing vocabulary navigator increment on results from RWI	7 years ago
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	17c7a85f18	Make StreamResponse usable in Java try-with-resources statements	7 years ago
luccioman	b67742336e	Provide user interface messages on vocabulary creation read/write errors	7 years ago
luccioman	3e8dd90211	Use https rather than http in links and queries to openstreetmap.org	7 years ago
luccioman	3a973dbb23	Removed unused import	7 years ago
luccioman	e9527cd0e5	Reuse the same Pattern instance when matching multiple key/values	7 years ago
luccioman	dbf4c1cd76	Improved blacklist entries editing operations : - Fixes issue #160 : handle properly syntax exceptions with a user friendly message - Fixes loss of information on multiple blacklist entries editions - Fixes loss of entries when moving entries from one list to another	7 years ago
reger	87077b8fb6	Adjust and move Language Navigator to be member of the navigatior plugin list.	7 years ago
luccioman	eb20589e29	Fixed issue #158 : completed div CSS class ignore in crawl	7 years ago
luccioman	0cdee4e26a	Fixed loss of "meanCount" search param when using facets or page buttons Then on new search queries, no suggestions at all could be displayed.	7 years ago
luccioman	117a859879	Do not clear all search modifiers when unselecting one modifier. Previously, when clicking a selected facet in the search results page to unselect it, all other eventually selected modifiers/facets were also removed.	7 years ago
luccioman	33593c22e9	Fixed loss of other modifiers on keywords/tags search navigation links	7 years ago
luccioman	a9dc0874c0	Remove old query terms from search results suggestions links. Especially when old terms were misspelled, suggestions links then provided most of the time empty results.	7 years ago
luccioman	9412881230	Added basic support for autotagging microdata annotated item types. With the appropriate vocabulary settings in Vocabulary_p.html page, this can produce Vocabulary search facets displaying item types referenced in html documents by microdata annotation. Tested notably, but not limited to, vocabulary classes/types defined by Schema.org and Dublin Core.	7 years ago
luccioman	5a14d34a7d	Refactoring : documented and extracted autotagging processing functions.	7 years ago
luccioman	58b9834729	Added HTML microdata typed items parsing capability. This adds the possibility for the HTML parser to gather typed items URLs annotated in HTML tags with itemscope and itemtype attributes (see microdata specification https://www.w3.org/TR/microdata/ ), notably Types from the schema.org vocabulary, but also Types/Classes from any other vocabulary, such as the common ones listed in the RDFa core context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).	7 years ago
luccioman	80fb1026d0	Create recrawl requests with the relevant crawl profile. Recrawl default profile was previously effectively used for crawl stacker acceptance check, but request entries were indeed still created with the "snippetGlobalText" profile.	7 years ago
luccioman	539925a275	Added an utility to generate/update XLIFF master file from lng files.	7 years ago
luccioman	fa6d030b0b	Moved dbtest to the test source folder.	7 years ago
luccioman	6cd3847d0a	Fixed NullPointerException case on Table init with relative file path. Can occur for example when running dbtest with relative test table file name (wihout explicit parent folder).	7 years ago
luccioman	28883d8a71	Shutdown daemon threads at the end of dbtest	7 years ago
luccioman	929e0d6eae	Replaced improper ByteBuffer.equals() implementation by Arrays.equals() Renamed also ByteBuffer.equals() to startsWith() as this is the appropriate function implementation semantics.	7 years ago
luccioman	46b5249c20	Removed time condition on HostBalancer initialization in JUnit test. Its initialization in main application usage remains asynchronous.	7 years ago
luccioman	8b572b7337	Commit Solr index before simulating or starting recrawl job. This ensures up-to-date simulation query results, and recrawl processing.	7 years ago
luccioman	733cacdbb8	Revised the RDFaParser main launcher for minimal proper operation. This parser is still not enabled in the main text parsers list. More would have to be done to make it functional.	7 years ago
luccioman	7baa99f26f	Fixed stored URL in web cache when redirection(s) occurs. Associate cached content to the last redirection location, instead of the first URL of a redirection(s) chain : - for proper base URL processing in parsers (fixes mantis 636 - http://mantis.tokeek.de/view.php?id=636) - to prevent duplicated content in Solr index when recrawling a redirected URL	7 years ago
luccioman	9ddf92d143	Removed unncessary reflection usage for workflow tasks. This improves code readability and maintainability (calls hierarchy are easier to read) and eventually performance.	7 years ago
luccioman	897d3d30cc	Added new recrawl job profile to the list of default crawl profiles	7 years ago
luccioman	9624516bf8	Refresh recrawl job profile threshold date like other default profiles	7 years ago
luccioman	b712a0671e	Added a specific default crawl profile for the recrawl job. - with only light constraint on known indexed documents load date, as it can already been controlled by the selection query, and the goal of the job is indeed to recrawl selected documents now - using the iffresh cache strategy	7 years ago
luccioman	adf3fa493d	Added comments about crawl profiles recrawl cycles	7 years ago
luccioman	3638e16c2e	More comprehensive log on rejected recrawls caused by date constraint	7 years ago
luccioman	d47afe6fab	Use a constant for crawler reject reason prefix with specific processing	7 years ago
luccioman	4e03335625	Added more details to the recrawl job report	7 years ago

... 2 3 4 5 6 ...

8901 Commits (6bd5f49c412be8edaadc5bc707dcd2a521207da5)