yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	21ad9435ec	Fixed crawl queue folder naming for IPv6 hosts on MS Windows filesystems As reported by @vikulin in issue #187, crawling websites using a raw IPv6 address as host name in their URL failed when running on Microsoft Windows platforms (FAT32 or NTFS filesystems) when YaCy crawler created the crawl queue folder, as the ':' character which is part of an IPV6 address is forbidden on these filesystems.	7 years ago
luccioman	8a29551c54	Upgraded the OpenGeoDB dump URL The status of the library in the DictionaryLoader_p.html page now also advertises the user that an upgrade can be applied when an older dump is already loaded. Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter chat.	7 years ago
luccioman	373edf9eac	Adjusted yjson Solr writer to support responses from an external Solr Worked previously only with responses from YaCy embedded Solr, now able to render the response when YaCy is configured to use an external Solr index.	7 years ago
luccioman	87bd17b1cf	Simplified a little bit the RSS OpenSearch Solr writer	7 years ago
luccioman	dc49ca9c27	Fixed a NPE case on the Solr OpenSearch response writer Occurred when omitHeader parameter is set to true	7 years ago
luccioman	f4267ed247	Made Solr OpenSearch RSS writer compatible with external Solr index Worked previously only with responses from YaCy embedded Solr, now able to render the response when YaCy is configured to use an external Solr index.	7 years ago
luccioman	b1410f593a	Fixed stylesheet relative URLs rendering in Solr html writer Relative URLs to CSS stylesheets were not properly rendered when using the Solr html response writer and the "/solr/collection1/select" entry point instead of "/solr/select".	7 years ago
luccioman	89c59814da	Improved rendering of the Solr api relative url in the html writer In order to have a consistent relative url when using either /solr/select or /solr/collection1/select entry point.	7 years ago
luccioman	bf4f320b16	Optionally render the response header when using the Solr html writer With params rendered as html input fields for conveniently modifying params values and refreshing results.	7 years ago
luccioman	313204ae2c	Override qf and df Solr params with defaults only when they are not set	7 years ago
luccioman	bdafb14336	Removed redundant synchronization lock on network switch function Was useless as done in an already synchronized block, and the lock object was assigned a new value in that same block, and nowhere else a lock is requested on that same object.	7 years ago
luccioman	d5f44ea216	Removed unnecessary synchronization lock from serverSwitch constructor Lock was useless here as it was set on an object instance attribute while the object itself is not yet constructed and no other threads can access it.	7 years ago
luccioman	dcad393fe5	Fixed exceeding max size of failreason_s Solr field on large link list When using the 'From Link-List of URL' as a crawl start, with lists in the order of one or more thousands of links, the failreason_s Solr field maximum size (32kb) was exceeded by the string representation of the URL must-match filter when a crawl URL was rejected because not matching.	7 years ago
luccioman	f467601561	Properly lock solrInstances for reboot and restoration of embedded Solr Putting a synchronization lock directly on the solrInstances property was ineffective as it is assigned a new (unlocked) instance in these operations.	7 years ago
luccioman	9630f81306	Fixed small unnecessary lines of code	7 years ago
luccioman	876bcd2f54	Fixed useless comparison between int parameter and Long.MAX_VALUE	7 years ago
luccioman	c726154a59	Fixed removal of URLs from the delegatedURL remote crawl stack URLs were removed from the stack using their hash as a bytes array, whereas the hash is stored in the stack as String instance.	7 years ago
luccioman	2bdd71de60	Added server side columns sorting on the Process Scheduler table For easier usage of large tables in the Table_API_p.html page.	7 years ago
luccioman	bb51555830	Removed remaining unsafe accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	7 years ago
luccioman	f895745e1c	Removed more unsafe concurrent accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	7 years ago
luccioman	e97580dfc7	Fixed unsafe conccurent access to generic SimpleDateFormat instances SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	7 years ago
luccioman	8811700e2e	Upgraded Jetty dependency from 9.4.9 to 9.4.11	7 years ago
luccioman	d53c33e4ef	Fixed potential infinite loop case (does not occur in current code base)	7 years ago
luccioman	a15ac8e0ca	Made CrawlProfile loading tolerant to malformed json string attribute	7 years ago
luccioman	a715bb7876	Fixed rendering of solr mustNoMatch value on CrawlProfileEditor_p.xml	7 years ago
luccioman	0b302c5004	Do not block whole server startup on persisted crawl profile load error	7 years ago
luccioman	4d9aa4ed1e	Fixed default crawl profile solr mustnotmatch query from previous commit	7 years ago
luccioman	cced94298a	Added a new crawler document filter type using Solr syntax This makes possbile to set up much more advanced document crawl filters, by filtering on one or more document indexed fields before inserting in the index.	7 years ago
Michael Christen	e0dc632020	removed transformer it was not used any more	7 years ago
luccioman	9bc7b6c39d	Allow edtion of scheduled next execution dates for finer control Can be useful more especially when scheduling many API calls over a long period of time to precisely adjust each scheduled date/time.	7 years ago
luccioman	40e8c7b89b	Use the heavy ConcurrentUpdateSolrClient only when necessary Prefer the lightweight HttpSolrClient when no updates are performed on the remote Solr instance, as recommended by Solr documentation itself.	7 years ago
luccioman	bd4cfeda3f	Add a max acceptable limit to the size of Solr responses on p2p search Following activation of gzip compression on responses, to ensure uncompressed content can fit on available memory.	7 years ago
luccioman	de4ea95687	Consistently allow gzip compression of remote Solr responses Was already enabled when requesting remote Solr with https or with authentication (as an external Solr index)	7 years ago
luccioman	cea8187161	Reuse expired connections evictors threads provided by apache and solr	7 years ago
luccioman	b5dc1f376f	Made outgoing pools max total connections user configurable For a finer control over the maximum simultaneously active outgoing connections.	7 years ago
luccioman	387d646c0e	Added gzip compression of responses returned to user-agents accepting it Enabled as default, but can be disabled using the "Server Access Settings" admin page.	7 years ago
luccioman	a7a4ba3287	Apply remote solr configured timeout on getting connection from pool	7 years ago
luccioman	ee6670fb8f	Use a common pooled http connection manager for remote solr instances For a better control on the maximum simultaneous outgoing http connections, as already done for any other http connections (crawls, rwi search, p2p protocol) using the net.yacy.cora.protocol.http.HTTPClient	7 years ago
luccioman	d28f9ba0f6	Removed use of deprecated ConcurrentUpdateSolrClient constructor	7 years ago
luccioman	8a749aa5ad	Trace level log message for monitoring remote solr response times	7 years ago
luccioman	35826a3091	Added a search page customization setting to display or not favicons If not interested in displaying this on your search results and notably on a peer with limited resources this can help saving some CPU and outgoing network connections.	7 years ago
luccioman	0082b5ab2a	Added missing default Solr http client connection timeout initialization Consistently with the custom Solr http client used for https connections to remote Solr peers or to YaCy external Solr storage. This prevent remote Solr requests threads to wait for establishing a connection to a remote peer longer than the configured timeout.	7 years ago
luccioman	fa4399d5d2	Small perf improvement : initialize threads names early when possible Initializing Thread names using the Thread constructor parameter is faster as it already sets a thread name even if no customized one is given, while an additional call to the Thread.setName() function internally do synchronized access, eventually runs access check on the security manager and performs a native call. Profiling a running YaCy server revealed that the total processing time spent on Thread.setName() for a typical p2p search was in the range of seconds.	7 years ago
luccioman	84d82bfdd7	Adjusted suggestions timeout management * less CPU usage using the Solr 'allowedTime' parameter * increase chances to get some results even when a first operation step goes in time out by letting some time for final snippets results processing	7 years ago
luccioman	65854bcb22	Fixed NullPointerException when omitHeader=true on external Solr server	7 years ago
luccioman	c4d984cec8	Fixed Solr response header duplication when requesting external Solr	7 years ago
luccioman	124cc24aa3	Properly handle embedded Solr partial results Solr can provide partial results for example when a processing time limit (specified with the parameter `timeAllowed`) is exceeded. Before this fix, getting partial results from an embedded Solr index resulted in a ClassCastException : "org.apache.solr.common.SolrDocumentList cannot be cast to org.apache.solr.response.ResultContext".	7 years ago
luccioman	3ce44cf250	Fixed largest snippet get : don't reject ones starting with a space char	7 years ago
luccioman	f511e16d50	Prevent duplication of Solr query highlight fields parameters That was caused by concurrent modifications (with addHighlightField() function) to the same SolrQuery instance when requesting Solr on remote peers in p2p search.	7 years ago
luccioman	e357ade47d	Reduced memory footprint of text snippet extraction By not parsing and storing at first all sentences of a document, but only on the fly the ones necessary to compute the snippet.	7 years ago

1 2 3 4 5 ...

4240 Commits (21ad9435ec6bccaea87e91f4d6111d47b4458a26)