yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	84d82bfdd7	Adjusted suggestions timeout management * less CPU usage using the Solr 'allowedTime' parameter * increase chances to get some results even when a first operation step goes in time out by letting some time for final snippets results processing	7 years ago
luccioman	65854bcb22	Fixed NullPointerException when omitHeader=true on external Solr server	7 years ago
luccioman	c4d984cec8	Fixed Solr response header duplication when requesting external Solr	7 years ago
luccioman	124cc24aa3	Properly handle embedded Solr partial results Solr can provide partial results for example when a processing time limit (specified with the parameter `timeAllowed`) is exceeded. Before this fix, getting partial results from an embedded Solr index resulted in a ClassCastException : "org.apache.solr.common.SolrDocumentList cannot be cast to org.apache.solr.response.ResultContext".	7 years ago
luccioman	3ce44cf250	Fixed largest snippet get : don't reject ones starting with a space char	7 years ago
luccioman	f511e16d50	Prevent duplication of Solr query highlight fields parameters That was caused by concurrent modifications (with addHighlightField() function) to the same SolrQuery instance when requesting Solr on remote peers in p2p search.	7 years ago
luccioman	e357ade47d	Reduced memory footprint of text snippet extraction By not parsing and storing at first all sentences of a document, but only on the fly the ones necessary to compute the snippet.	7 years ago
luccioman	e115e57cc7	Reduced text snippet extraction processing time. By not generating MD5 hashes on all words of indexed texts, processing time is reduced by 30 to 50% on indexed documents with more than 1Mbytes of plain text.	7 years ago
sgaebel	4b79851e12	corrected icons_sizes_sxt to SolrType.string	7 years ago
luccioman	3b89c232db	Easier tracking of longest text snippets initializations When text snippets statistics are enabled and FINE log level is enabled on the TextSnippetStatistics class.	7 years ago
luccioman	3c4344cb12	Fixed text snippet max init time statistic rendering	7 years ago
reger	a8234b7ea7	Make sure for image resource url enabled index image pixel size fields are filled if at least one of the image size fields is enabled in index (images_height_val, images_width_val, images_pixel_val). Previously all fields were required to be enabled (hint: default setting is height + width enabled)	7 years ago
luccioman	e67df103b5	Removed more remaining uses of deprecated Seed.getIP() function.	7 years ago
luccioman	addd18c993	Removed some remaining uses of deprecated Seed.getIP()	7 years ago
luccioman	c35d0568b6	Support for preferred https in peers communication on more operations	7 years ago
luccioman	e914d17aca	Updated call to function deprecated since commons-codec version 1.11	7 years ago
luccioman	a3ec7a7a5f	Added analysis optional setting to compute statistics on text snippets Thus producing some basic stats on processing times for snippets generation and counts on snippets per source type.	7 years ago
luccioman	1889d484de	Added Solr HTML writer support for responses from remote instances	7 years ago
luccioman	2af3bf79c7	Improve rendering of remote Solr admin URLs - properly handle IPv6 loopback address replacement - replace loopback address or host only when accessing peer remotely - replace loopback part with the peer hostname as requested rather than with its seed public IP as this works better for Intranet mode and when peer is behind a reverse proxy.	7 years ago
luccioman	bb74de7d59	Removed unnecessary "/admin" suffix from remote Solr instance admin URL For quite quite a long time now, the Solr /admin URL suffix indeed redirects to the Solr base context (see https://issues.apache.org/jira/browse/SOLR-3337)	7 years ago
luccioman	0d34034f17	Ensure an embedded Solr is available for Solr dump/restore operations Otherwise, these operations triggered NullPointerException when only an external Solr index is attached.	7 years ago
luccioman	d92b191942	Ensure no remote Solr is attached before "Shut Down and Re-Start Solr" Otherwise once this operation is applied, the remote Solr(s) instances are deconnected and the embedded Solr is connected even if disabled by setting "core.service.fulltext". Also use constants for related default setting values.	7 years ago
luccioman	26d8ad591c	Adjusted Solr select servlet output when using an external Solr only - Use the EnhancedXMLResponseWriter only when requested output is "exml" - Use the Standard Solr writers when possible, for example for json, xml or javabin output formats - Return an error when the requested format can not been rendered with an external Solr server only Important : this modification is necessary for peers using exclusively an external Solr server to be reachable as robinson targets in p2p search, as the binary format ("javabin") is the default Solr exchange format for peers. Before this, when a peer requested a remote one attached only to an external Solr (no embedded one), it ended with "Invalid type" error, as the remote peer answered with xml although binary format was requested.	7 years ago
luccioman	69690c13a0	Optionally allow external Solr server with self-signed certificate This is necessary when you want to attach to a dedicated external Solr server protected with basic http authentication and requested over https but having only a self-signed certificate.	7 years ago
luccioman	b882f85900	Fixed NPE case in Solr select servlet on external Solr only setup Regression introduced with commit `0d7625ecfb`	7 years ago
luccioman	2fd4d05e2f	Added a shared Java constant for setting key server.servlets.called	7 years ago
luccioman	ba9cd14516	Removed hard-coded patch for Solr 5.0 on ranking boost function The current default boost function (`recip(ms(NOW,last_modified),3.16e-11,1,1)`) for the Date ranking profile is indeed working fine. What can trigger the error `unexpected docvalues type NUMERIC for field 'last_modified'` is the previous default boost function (quite old now) or any custom one using the Solr `ord` or `rord` functions on the last_modified field. Then the problem was that the migration code in the Switchboard supposed to detect the old date boost function was incorrect (one trailing right parenthesis in excess), so the deprecated function remained. This fixes issue #169.	7 years ago
luccioman	fb3032c530	Added a crawl filtering possibility on documents Media Type (MIME)	7 years ago
luccioman	e45afedee4	Added support for enclosures (media links) to the RSS loader	7 years ago
luccioman	aaefd5219c	Reduce log verbosity of RSS loader on feed items with no link	7 years ago
luccioman	cf62b571bd	Added RSS reader support for `enclosure` feed item sub element. Enclosure element (see http://www.rssboard.org/rss-specification#ltenclosuregtSubelementOfLtitemgt ) can be seen for example in podcasts feeds.	7 years ago
luccioman	e5f5de0fc7	Added some JavaDoc to the RSSMessage class.	7 years ago
luccioman	0d7625ecfb	Handle Solr fields restrict and alias in YaCy html and exml writers Thus allowing for example to read more easily the local Solr index full metadata in HTML by restricting if desired to some fields of interest. See Solr documentation about the 'fl' (Field List) parameter at https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefl_FieldList_Parameter	7 years ago
luccioman	3da2739bbd	Parse and index more common audio metadata text tag fields.	7 years ago
luccioman	846aba00fa	Added parsing of URLs eventually present in audio metadata tags	7 years ago
Michael Peter Christen	187075b878	added nav filter	7 years ago
luccioman	bcbd0ae1a4	Enabled partial parsing of audio resources.	7 years ago
luccioman	fda0189613	Updated audio file extensions with ones recently added to audioTagParser	7 years ago
luccioman	978e2be95b	Let a chance for other parsers on audioTagParser error As done in all other parsers, eventually falling back in the end to the genericParser which creates a minimal index entry.	7 years ago
luccioman	9e5846a26e	Small fix on svg parser error message	7 years ago
luccioman	11611dbdcf	Reuse existing File copy function to handle audio parser tmp files	7 years ago
luccioman	f77f8f40f9	Factored audio parser tag processing	7 years ago
luccioman	9a7a353d0e	Removed some unnecessary intermediate list creation on array copy.	7 years ago
luccioman	fb6457f5bc	Fixed NPE case when on audio resource parsed with null tag	7 years ago
luccioman	c3ff50c17a	Updated the list of audio file formats supported by the audioTagParser Follows upgrade to Jaudiotagger dependency to version 2.2.5.	7 years ago
luccioman	1b90479a76	Added missing vocabulary navigator increment on results from RWI	7 years ago
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	17c7a85f18	Make StreamResponse usable in Java try-with-resources statements	7 years ago
luccioman	b67742336e	Provide user interface messages on vocabulary creation read/write errors	7 years ago
luccioman	3e8dd90211	Use https rather than http in links and queries to openstreetmap.org	7 years ago
luccioman	3a973dbb23	Removed unused import	7 years ago
luccioman	e9527cd0e5	Reuse the same Pattern instance when matching multiple key/values	7 years ago
luccioman	dbf4c1cd76	Improved blacklist entries editing operations : - Fixes issue #160 : handle properly syntax exceptions with a user friendly message - Fixes loss of information on multiple blacklist entries editions - Fixes loss of entries when moving entries from one list to another	7 years ago
reger	87077b8fb6	Adjust and move Language Navigator to be member of the navigatior plugin list.	7 years ago
luccioman	eb20589e29	Fixed issue #158 : completed div CSS class ignore in crawl	7 years ago
luccioman	0cdee4e26a	Fixed loss of "meanCount" search param when using facets or page buttons Then on new search queries, no suggestions at all could be displayed.	7 years ago
luccioman	117a859879	Do not clear all search modifiers when unselecting one modifier. Previously, when clicking a selected facet in the search results page to unselect it, all other eventually selected modifiers/facets were also removed.	7 years ago
luccioman	33593c22e9	Fixed loss of other modifiers on keywords/tags search navigation links	7 years ago
luccioman	a9dc0874c0	Remove old query terms from search results suggestions links. Especially when old terms were misspelled, suggestions links then provided most of the time empty results.	7 years ago
luccioman	9412881230	Added basic support for autotagging microdata annotated item types. With the appropriate vocabulary settings in Vocabulary_p.html page, this can produce Vocabulary search facets displaying item types referenced in html documents by microdata annotation. Tested notably, but not limited to, vocabulary classes/types defined by Schema.org and Dublin Core.	7 years ago
luccioman	5a14d34a7d	Refactoring : documented and extracted autotagging processing functions.	7 years ago
luccioman	58b9834729	Added HTML microdata typed items parsing capability. This adds the possibility for the HTML parser to gather typed items URLs annotated in HTML tags with itemscope and itemtype attributes (see microdata specification https://www.w3.org/TR/microdata/ ), notably Types from the schema.org vocabulary, but also Types/Classes from any other vocabulary, such as the common ones listed in the RDFa core context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).	7 years ago
luccioman	80fb1026d0	Create recrawl requests with the relevant crawl profile. Recrawl default profile was previously effectively used for crawl stacker acceptance check, but request entries were indeed still created with the "snippetGlobalText" profile.	7 years ago
luccioman	539925a275	Added an utility to generate/update XLIFF master file from lng files.	7 years ago
luccioman	fa6d030b0b	Moved dbtest to the test source folder.	7 years ago
luccioman	6cd3847d0a	Fixed NullPointerException case on Table init with relative file path. Can occur for example when running dbtest with relative test table file name (wihout explicit parent folder).	7 years ago
luccioman	28883d8a71	Shutdown daemon threads at the end of dbtest	7 years ago
luccioman	929e0d6eae	Replaced improper ByteBuffer.equals() implementation by Arrays.equals() Renamed also ByteBuffer.equals() to startsWith() as this is the appropriate function implementation semantics.	7 years ago
luccioman	46b5249c20	Removed time condition on HostBalancer initialization in JUnit test. Its initialization in main application usage remains asynchronous.	7 years ago
luccioman	8b572b7337	Commit Solr index before simulating or starting recrawl job. This ensures up-to-date simulation query results, and recrawl processing.	7 years ago
luccioman	733cacdbb8	Revised the RDFaParser main launcher for minimal proper operation. This parser is still not enabled in the main text parsers list. More would have to be done to make it functional.	7 years ago
luccioman	7baa99f26f	Fixed stored URL in web cache when redirection(s) occurs. Associate cached content to the last redirection location, instead of the first URL of a redirection(s) chain : - for proper base URL processing in parsers (fixes mantis 636 - http://mantis.tokeek.de/view.php?id=636) - to prevent duplicated content in Solr index when recrawling a redirected URL	7 years ago
luccioman	9ddf92d143	Removed unncessary reflection usage for workflow tasks. This improves code readability and maintainability (calls hierarchy are easier to read) and eventually performance.	7 years ago
luccioman	897d3d30cc	Added new recrawl job profile to the list of default crawl profiles	7 years ago
luccioman	9624516bf8	Refresh recrawl job profile threshold date like other default profiles	7 years ago
luccioman	b712a0671e	Added a specific default crawl profile for the recrawl job. - with only light constraint on known indexed documents load date, as it can already been controlled by the selection query, and the goal of the job is indeed to recrawl selected documents now - using the iffresh cache strategy	7 years ago
luccioman	adf3fa493d	Added comments about crawl profiles recrawl cycles	7 years ago
luccioman	3638e16c2e	More comprehensive log on rejected recrawls caused by date constraint	7 years ago
luccioman	d47afe6fab	Use a constant for crawler reject reason prefix with specific processing	7 years ago
luccioman	4e03335625	Added more details to the recrawl job report	7 years ago
luccioman	6425963cee	Fixed internal tables exact value match iterator	7 years ago
luccioman	0c9e0b3566	Record recrawl calls to make them schedulable	7 years ago
luccioman	433e241e4f	Added a report info box about eventual last terminated recrawl job For easier monitoring of recrawls.	7 years ago
luccioman	b2af25b14f	Added a stop condition to the Recrawl busy thread	7 years ago
luccioman	421728d25a	Made possible to customize selection query before launching a recrawl	7 years ago
luccioman	36e9b1c5b3	Fixed SegmentTest test case time dependant occasional failures As highlighted by latest automated Travis builds.	7 years ago
luccioman	8a4ea1c11e	Added UI switch to control content domain constraint per search request	7 years ago
reger	f8071ac8ae	Make TokenizedStringNavigator (used for keyword search facet) active check case insensitive. As keywords are compared lower case, make sure user input keyword:Key or keyword:key will be shown as active in facet entry key.	7 years ago
luccioman	e6907fdab3	Added optional search parameter/setting to control content domain filter Thus allowing to choose at configuration or per search request, whether extending or not results beyond strict content domain filter (image, video, audio or application). Related graphical controls to be added to user interface.	7 years ago
luccioman	f52217c939	Enable full size images preview for users with extended search rights	7 years ago
luccioman	09c4ee56a7	Added optional https support for remote crawl and profile operations	7 years ago
luccioman	5db1c9155a	Do locale independant case conversion on hosts, schemes, and file exts. Required for proper operation when the default system locale is Turkish, as dottless and dotted i characters have specific case conversion rules in this language.	7 years ago
luccioman	1c4803e40a	Enable optional https support for /yacy/transferURL API calls. Also updated some Javadoc and consistently use Switchboard instance as a constructor parameter where relevant.	7 years ago
luccioman	c6e1befbca	Restored peer URL host name stripping removed from previous commit. Still useful for peers with IPv6 addresses.	7 years ago
luccioman	17e004599d	Started implementing optional https preference for protocol operations Introduced through the new configurable setting network.unit.protocol.https.preferred, defaulting to false for now. Let choose to prefer using https when available on remote peers to perform YaCy protocol operations including notably hello or transferRWI. Not yet implemented for every YaCy protocol operations.	7 years ago
Michael Peter Christen	b907819cb4	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	7 years ago
Michael Peter Christen	25573bd5ab	added a crawl filter based on <div> tag class names When a crawl is started, a new field to exclude content from scraping is available. The field can be identified with the class name of div tags. All text contained in such a div tag where the configured class name(s) match are not indexed, while the remaining page is indexed.	7 years ago
luccioman	d95b288f19	Removed use of deprecated Jetty IPAccessHandler for client filtering. Upgraded to InetAccessHandler. Added InetPathAccessHandler extension to InetAccessHandler to maintain path patterns capability previously available in IPAccessHandler but lost in InetAccessHandler. Filtering on IPv6 addresses is now supported. Support for deprecated pattern formats such as "192.168." and "192.168.1.1/path" has been removed, but startup automated migration should convert such patterns eventually present in serverClient.	7 years ago
reger	cc7a93e6b6	remove deprecated jetty continuation class from urlproxyservlet (was a long time carry over, while not supporting async requests)	7 years ago
Michael Peter Christen	607b39b427	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Conflicts: htroot/yacysearchitem.java	7 years ago

1 2 3 4 5 ...

8681 Commits (a3361d5ee87e622e629be2546bb035a5f96610f6)