yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	fcf6b16db4	Added new crawler attribute for finer control over Media Type detection New "Media Type detection" section in the advanced crawl start page allow to choose between : - not loading URLs with unknown or unsupported file extension without checking the actual Media Type (relying Content-Type header for now). This was the old default behavior, faster, but not really accurate. - always cross check URL file extension against the actual Media Type. This lets properly parse URLs ending with an apparently odd file extension, but which have actually a supported Media Type such as text/html. Sample URLs with misleading file extensions added as documentation in the crawl start page. fixes issue #244	6 years ago
luccioman	88d0ed676c	Render http status instead of null responses on snapshot api errors	6 years ago
luccioman	92e10d7d1c	Added a crawl start hint message on availability or not of wkhtmltopdf As this tool is required to produce pdf snapshots	6 years ago
luccioman	8852c97cee	Added basic styling for cleaner rendering of missing image snapshots For the output of the Solr snapshots writer	6 years ago
luccioman	746e0e788d	Render a relevant HTTP status code on snapshot image rendering error Instead of a null response body which is not very helpful.	6 years ago
luccioman	753bda1409	Fixed remaining blacklist entries improper decoding of '+' character In the blacklist cleaner and import/export administration pages.	6 years ago
luccioman	61c337f29a	Decode blacklist entries for easier edition of non ascii chars Not using the JDK URLDecoder.decode() function, as it strips '+' characters when they occur after '?' (both characters having regular expression semantics when used in blacklist path patterns)	6 years ago
luccioman	ed93221fa1	Improved normalization of blacklist path patterns having non ascii chars Normalize blacklist path patterns using percent-encoding, at pattern edition in web interface and at loading from configuration files. Fixes issue #237	6 years ago
luccioman	d23578efc3	Merge pull request #240 from ivanhercaz/fixEnglishBookmarksPage Fix English Bookmarks.html	6 years ago
ivanhercaz	41684ba559	adding Spanish to the interface language list	6 years ago
ivanhercaz	1dafc85d33	typo fix in Bookmarks.html	6 years ago
luccioman	3d14fb51c5	Removed now unused Java import in addition to modification from PR #239	6 years ago
otter	8820d8d7c7	replace current date by FailDate	6 years ago
luccioman	c409ec089c	Hide password values from visible HTML in the Advanced Config page Fixes issue #228	6 years ago
luccioman	75b9cd53cc	Use accessible labels in the Server Access Settings page	6 years ago
luccioman	4ed055bcdf	Enforced access controls to System settings pages	6 years ago
luccioman	de6820d257	Updated html input field type for seed upload with file method - To meet current browsers security rules, which prevent selecting a full file path with an html input field of type 'file' - As it does not make sense to select a local file path when a the administered YaCy server is remote (not on the same computer as the browser)	6 years ago
luccioman	10548229af	Fixed rendering of the YMarks.html page Also to clarify which pages still depends on old JQuery and JQuery UI dependencies.	6 years ago
luccioman	bdd6ec3fff	Fetch result pages one by one when scrolling in portal search widget To prevent unnecessary load and items retrival errors on backend	6 years ago
luccioman	b46dc4fc94	Fixed portal search widget results favicon url	6 years ago
luccioman	fa96637a84	Configured local peer as default portal search widget backend Rather than relying on a peer eventually deployed on search.yacy.net	6 years ago
luccioman	44efb2f868	Removed implicit global JavaScript variables from portal search widget	6 years ago
luccioman	79643c40bf	Limit search API calls rate when typing in the search portal widget	6 years ago
luccioman	39dd29a484	Replaced RWI ranking JQuery sliders with standard HTML range inputs Considering that the sliders usage on that page is very basic, using standard HTML5 inputs of type "range" has here the following advantages : - better keyboard accessibility - remove not very necessary additional jquery dependencies Today browsers suport for range inputs is good, and even on old unsupporting browsers such as IE < 10 they nicely fall back to text inputs.	6 years ago
luccioman	1b866c6076	Added possibility to hide or show image results with rendering errors When searching images, thumbnails that could not be rendered (because of a load error such as HTTP 404, networking issue or an internal error on the rendering servlet) are now hidden as default. But can be revealed with a button if desired. Fix for issue #217	6 years ago
luccioman	5b60b4225f	Fixed encoding of '+' character on search pages links As revealed by issue #216	6 years ago
luccioman	b726b2b532	Removed unnecessary '+' character URL decoding from search query Manually replacing '+' character or "%20" by a space character in the search query parameter was necessary in YaCy a long time ago to properly decode application/x-www-form-urlencoded format (commit `9842fab6e4` in 2010). Since the introduction of Jetty as the embedded HTTP server (commit `4b77733e59` in 2013), this is no more necessary as Jetty internals already do this for us in org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(). So we can remove now this duplicated decoding as it prevents a proper use of the '+' character in search requests, as reported in issue #216.	6 years ago
luccioman	0efc6c89ef	Fixed rendering of crawl queues page for URLs with raw IPV6 addresses	6 years ago
luccioman	0e976e9030	Added a link to MediaWiki dumps summary in import page for convenience	6 years ago
luccioman	ecd4535eb6	Prevent entering empty OpenSearch URLs in ConfigHeuristics_p.html In order to early prevent adding invalid configuration entries to the heuristicopensearch.conf file, as revealed the issue #209.	6 years ago
luccioman	1ca9cb6bd9	Fixed a NullPointerException case, reported in issue #209	6 years ago
luccioman	8a29551c54	Upgraded the OpenGeoDB dump URL The status of the library in the DictionaryLoader_p.html page now also advertises the user that an upgrade can be applied when an older dump is already loaded. Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter chat.	6 years ago
luccioman	bf4f320b16	Optionally render the response header when using the Solr html writer With params rendered as html input fields for conveniently modifying params values and refreshing results.	6 years ago
luccioman	88e6ce23c9	Consistently render empty facets and facets having only entries at zero	6 years ago
luccioman	534f09e92b	Added and updated hint messages about remote crawler status To help identify why remote crawl results may not be received.	6 years ago
luccioman	c726154a59	Fixed removal of URLs from the delegatedURL remote crawl stack URLs were removed from the stack using their hash as a bytes array, whereas the hash is stored in the stack as String instance.	6 years ago
luccioman	2bdd71de60	Added server side columns sorting on the Process Scheduler table For easier usage of large tables in the Table_API_p.html page.	6 years ago
luccioman	f895745e1c	Removed more unsafe concurrent accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	5c6c61809a	Fixed JavaScript sorting of tables with cells containing an input field	6 years ago
luccioman	3885fd64a0	Fixed Table_API_p.html current table page loss on row editing. Reset only to the first table page when the search query is modified	6 years ago
luccioman	e97580dfc7	Fixed unsafe conccurent access to generic SimpleDateFormat instances SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	38a3a5e5ad	Fixed a NullPointerException case in the suggest api	6 years ago
luccioman	b159564c72	Properly render json string attributes in the crawl profile html editor	7 years ago
luccioman	cced94298a	Added a new crawler document filter type using Solr syntax This makes possbile to set up much more advanced document crawl filters, by filtering on one or more document indexed fields before inserting in the index.	7 years ago
Michael Christen	e0dc632020	removed transformer it was not used any more	7 years ago
luccioman	eb94986f95	Added Italian in available web interface languages list	7 years ago
luccioman	9bc7b6c39d	Allow edtion of scheduled next execution dates for finer control Can be useful more especially when scheduling many API calls over a long period of time to precisely adjust each scheduled date/time.	7 years ago
luccioman	b5dc1f376f	Made outgoing pools max total connections user configurable For a finer control over the maximum simultaneously active outgoing connections.	7 years ago
luccioman	387d646c0e	Added gzip compression of responses returned to user-agents accepting it Enabled as default, but can be disabled using the "Server Access Settings" admin page.	7 years ago
luccioman	a1990202ab	Fixed unresolve-pattern case on old html title	7 years ago
luccioman	35826a3091	Added a search page customization setting to display or not favicons If not interested in displaying this on your search results and notably on a peer with limited resources this can help saving some CPU and outgoing network connections.	7 years ago
luccioman	79bd9f623a	Updated YaCy home page embedded links from http to https scheme	7 years ago
luccioman	1dfd3e9dde	Limit the rate of calls to the suggest API when typing in search field	7 years ago
luccioman	4f0ab318ef	Fixed snippets statistics displayed "provided by Solr" count	7 years ago
luccioman	e115e57cc7	Reduced text snippet extraction processing time. By not generating MD5 hashes on all words of indexed texts, processing time is reduced by 30 to 50% on indexed documents with more than 1Mbytes of plain text.	7 years ago
luccioman	ce289ebaf7	Upgraded ConfigNetwork_p html doctype and added language attribute	7 years ago
luccioman	16254fac1e	Removed unpaired select closing tag	7 years ago
luccioman	692c1cfdde	Added a UI section to configure encryption of peers communications	7 years ago
luccioman	e67df103b5	Removed more remaining uses of deprecated Seed.getIP() function.	7 years ago
luccioman	addd18c993	Removed some remaining uses of deprecated Seed.getIP()	7 years ago
luccioman	c35d0568b6	Support for preferred https in peers communication on more operations	7 years ago
luccioman	0a058ba6af	Keep https in result message URL when push_p API is requested over https	7 years ago
luccioman	8bc36506f2	Enforced access controls on basic administration settings pages. Ensuring http post method is used for operations with server-side effects (in respect of http semantics), and a valid transaction token is provided by the user-agent.	7 years ago
luccioman	a3ec7a7a5f	Added analysis optional setting to compute statistics on text snippets Thus producing some basic stats on processing times for snippets generation and counts on snippets per source type.	7 years ago
luccioman	72808655a5	Added controls on mode switch when attached to remote Solr instance(s) - to prevent unwanted exposure of index entries about private local/intranet documents when switching from "Intranet Indexing" mode while attached to remote Solr instance(s) - to warn user about remote Solr instance(s) still attached when switching from modes other than "Intranet Indexing"	7 years ago
luccioman	2af3bf79c7	Improve rendering of remote Solr admin URLs - properly handle IPv6 loopback address replacement - replace loopback address or host only when accessing peer remotely - replace loopback part with the peer hostname as requested rather than with its seed public IP as this works better for Intranet mode and when peer is behind a reverse proxy.	7 years ago
luccioman	0d34034f17	Ensure an embedded Solr is available for Solr dump/restore operations Otherwise, these operations triggered NullPointerException when only an external Solr index is attached.	7 years ago
luccioman	d92b191942	Ensure no remote Solr is attached before "Shut Down and Re-Start Solr" Otherwise once this operation is applied, the remote Solr(s) instances are deconnected and the embedded Solr is connected even if disabled by setting "core.service.fulltext". Also use constants for related default setting values.	7 years ago
luccioman	69690c13a0	Optionally allow external Solr server with self-signed certificate This is necessary when you want to attach to a dedicated external Solr server protected with basic http authentication and requested over https but having only a self-signed certificate.	7 years ago
luccioman	211f3d04ab	Added hint message inciting to check accounts settings on fresh install When unrestricted access from localhost is set and the accounts config page has not been visited at all.	7 years ago
luccioman	2fd4d05e2f	Added a shared Java constant for setting key server.servlets.called	7 years ago
luccioman	033f7c4c00	Adjusted localhost/qualified account admin access informational texts. Following remarks from @etam on issue #170	7 years ago
luccioman	05702c2ced	Adjusted api table query matching strategies When inlined (for example in the CrawlProfileEditor_p.html page) : search only on the comment, as the url is not visible On regular display : search on comment OR url, instead of comment AND url. Otherwise searching on comments terms is almost useless as these terms are not necessarily present in the url.	7 years ago
luccioman	65451a3d62	Fixed start record on the last api table results page When the last results page size was lower than maximumRecords, results from the previous page where displayed again.	7 years ago
luccioman	86c902b853	Enable api table page navigation with search query Applied the same default results page size as when a type filter is defined for proper and consistend page navigation when combining type filter and search query.	7 years ago
luccioman	9c7faa04d8	Display the total number of matching items when filtering on table API Notably for a proper page navigation of the crawl scheduler table (CrawlProfileEditor_p.html page).	7 years ago
luccioman	311e91ff77	Added hint to clarify results rendered dates and 'Sort by date' switch	7 years ago
luccioman	90dc580158	Fixed initial ViewFile mode and suggestions links from previous commit	7 years ago
luccioman	0b6aed4de6	Keep the selected view mode when typing a new URL in the ViewFile page Otherwise, when interested in viewing `Link List` for example, each time you typed a new URL, `Parsed Sentences` view mode was selected as default and you had to selected again the view mode you are insterested in.	7 years ago
luccioman	db55eaa673	Updated link to Solr Function Queries documentation page	7 years ago
luccioman	7496df93c3	Fixed error 414 (URI Too Long) when manually selecting to many RSS items Switched form method to HTTP POST to prevent this.	7 years ago
luccioman	fb3032c530	Added a crawl filtering possibility on documents Media Type (MIME)	7 years ago
luccioman	90d4802082	Updated link URL to IANA Media Types with https	7 years ago
luccioman	e45afedee4	Added support for enclosures (media links) to the RSS loader	7 years ago
luccioman	aaefd5219c	Reduce log verbosity of RSS loader on feed items with no link	7 years ago
Michael Peter Christen	187075b878	added nav filter	7 years ago
luccioman	07e8628853	Added HTML5 embedded audio for results playing on supporting browsers Restricted to authenticated or localhost users only to prevent redistribution license issues.	7 years ago
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	348d07a999	Enforced controls on vocabulary editing operations.	7 years ago
luccioman	2532db2ce6	Vocabulary editor : use accessible labels and CSS for elements position	7 years ago
luccioman	ac14437316	Vocabulary_p.html : richer semantics for HTML tables Also replaced deprecated attributes	7 years ago
luccioman	b67742336e	Provide user interface messages on vocabulary creation read/write errors	7 years ago
luccioman	ea57763294	Mark vocabulary name field as required using html instead of JavaScript	7 years ago
luccioman	39ec8cba37	Fixed Vocabulary_p.html HTML validation errors. Validated with Validated with Nu Html Checker 17.11.1.	7 years ago
luccioman	7c644090ff	Fixed CrawlStartExpert.html HTML validation errors Validated with Nu Html Checker 17.11.1	7 years ago
luccioman	519fc9a600	Issue #156 : new option to clean up (or not) search cache on crawl start Prevent also unnecessary search event cache clean-up on each access to the crawl monitor page (Crawler_p.html).	7 years ago
luccioman	3e8dd90211	Use https rather than http in links and queries to openstreetmap.org	7 years ago
luccioman	8d7099a081	Handle escaped line breaks and separators in vocabulary import from CSV	7 years ago
luccioman	09f93fed0e	Added a line start field for vocabulary import from CSV file As a convenience to ignore eventual CSV header lines	7 years ago
luccioman	d28d612069	Added option to choose field delimiter in vocabulary import from CSV	7 years ago
luccioman	95f1954c78	Adjusted last blacklist entry example for a more accurate description As discussed in issue #160 , blacklist entries can indeed currently not be "complete" regular expressions, but must be structured as a domain part, a separator character ('/'), and a path part.	7 years ago
luccioman	dbf4c1cd76	Improved blacklist entries editing operations : - Fixes issue #160 : handle properly syntax exceptions with a user friendly message - Fixes loss of information on multiple blacklist entries editions - Fixes loss of entries when moving entries from one list to another	7 years ago
reger	5df72c1c65	Remove now obsolete html for language-nav and ISO639 jar reference	7 years ago
reger	87077b8fb6	Adjust and move Language Navigator to be member of the navigatior plugin list.	7 years ago
luccioman	eb20589e29	Fixed issue #158 : completed div CSS class ignore in crawl	7 years ago
luccioman	fa65fb1a03	Fixed loss of search modifiers on bookmark, recommand or delete result	7 years ago
luccioman	0cdee4e26a	Fixed loss of "meanCount" search param when using facets or page buttons Then on new search queries, no suggestions at all could be displayed.	7 years ago
luccioman	117a859879	Do not clear all search modifiers when unselecting one modifier. Previously, when clicking a selected facet in the search results page to unselect it, all other eventually selected modifiers/facets were also removed.	7 years ago
luccioman	a9dc0874c0	Remove old query terms from search results suggestions links. Especially when old terms were misspelled, suggestions links then provided most of the time empty results.	7 years ago
luccioman	c71b545235	Enable results suggestions (Did you Mean) even when RWI is not enabled. RWI is no more necessary for suggestions processing since commit `c40ba51ca6`. Revealed by a question about spell check from ouahpiti on YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6084 ).	7 years ago
luccioman	9412881230	Added basic support for autotagging microdata annotated item types. With the appropriate vocabulary settings in Vocabulary_p.html page, this can produce Vocabulary search facets displaying item types referenced in html documents by microdata annotation. Tested notably, but not limited to, vocabulary classes/types defined by Schema.org and Dublin Core.	7 years ago
luccioman	539925a275	Added an utility to generate/update XLIFF master file from lng files.	7 years ago
luccioman	41a6b052d9	Updated master and French translation for the IndexReIndexMonitor_p page	7 years ago
luccioman	929e0d6eae	Replaced improper ByteBuffer.equals() implementation by Arrays.equals() Renamed also ByteBuffer.equals() to startsWith() as this is the appropriate function implementation semantics.	7 years ago
luccioman	8b572b7337	Commit Solr index before simulating or starting recrawl job. This ensures up-to-date simulation query results, and recrawl processing.	7 years ago
luccioman	5e2812c060	Automatically refresh running recrawl report when JavaScript is enabled. For users who would prefer to keep JavaScript disabled, a manual Refresh button is still available.	7 years ago
luccioman	0fce264ba4	Set reindex page to html5 and removed presentational only html tables.	7 years ago
luccioman	83df922afc	Removed unused duplicated HTML id on header hidden field	7 years ago
luccioman	4e03335625	Added more details to the recrawl job report	7 years ago
luccioman	d95d393a0d	Add a query link to local Solr to browse selected recrawl candidates	7 years ago
luccioman	59f7763af6	Display recrawl job report also when job is actively running	7 years ago
luccioman	0c9e0b3566	Record recrawl calls to make them schedulable	7 years ago
luccioman	433e241e4f	Added a report info box about eventual last terminated recrawl job For easier monitoring of recrawls.	7 years ago
luccioman	b2af25b14f	Added a stop condition to the Recrawl busy thread	7 years ago
luccioman	421728d25a	Made possible to customize selection query before launching a recrawl	7 years ago
luccioman	fab6e54fec	Enforced controls (HTTP method, token) on ReIndex and ReCrawl operations	7 years ago
luccioman	8a4ea1c11e	Added UI switch to control content domain constraint per search request	7 years ago
luccioman	36a45b3905	Added UI setting for strictness of content-type checking on media search	7 years ago
luccioman	e6907fdab3	Added optional search parameter/setting to control content domain filter Thus allowing to choose at configuration or per search request, whether extending or not results beyond strict content domain filter (image, video, audio or application). Related graphical controls to be added to user interface.	7 years ago
luccioman	d42c1773c8	Added UI setting for optional encryption with https on p2p searches	7 years ago
luccioman	09c4ee56a7	Added optional https support for remote crawl and profile operations	7 years ago
luccioman	5db1c9155a	Do locale independant case conversion on hosts, schemes, and file exts. Required for proper operation when the default system locale is Turkish, as dottless and dotted i characters have specific case conversion rules in this language.	7 years ago
luccioman	1c4803e40a	Enable optional https support for /yacy/transferURL API calls. Also updated some Javadoc and consistently use Switchboard instance as a constructor parameter where relevant.	7 years ago
luccioman	79a2ba306a	Updated links to Java Regular Expressions documentation to version 8	7 years ago
luccioman	17e004599d	Started implementing optional https preference for protocol operations Introduced through the new configurable setting network.unit.protocol.https.preferred, defaulting to false for now. Let choose to prefer using https when available on remote peers to perform YaCy protocol operations including notably hello or transferRWI. Not yet implemented for every YaCy protocol operations.	7 years ago
ScRe13	bb3d3fe074	fixed default loading default settings; load was populated with wrong value	7 years ago
reger	20bba135fe	Show hide or show public surftip button depending on current config status, to show the button to switch the status (hiding button of current status)	7 years ago
Michael Peter Christen	b907819cb4	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	7 years ago
Michael Peter Christen	25573bd5ab	added a crawl filter based on <div> tag class names When a crawl is started, a new field to exclude content from scraping is available. The field can be identified with the class name of div tags. All text contained in such a div tag where the configured class name(s) match are not indexed, while the remaining page is indexed.	7 years ago
luccioman	640fed2a9c	Removed Java 1.8 no more necessary version checking (fixes issue #147 ) Java 1.8 is by the way now a prerequisite to run from latest sources.	7 years ago
luccioman	d95b288f19	Removed use of deprecated Jetty IPAccessHandler for client filtering. Upgraded to InetAccessHandler. Added InetPathAccessHandler extension to InetAccessHandler to maintain path patterns capability previously available in IPAccessHandler but lost in InetAccessHandler. Filtering on IPv6 addresses is now supported. Support for deprecated pattern formats such as "192.168." and "192.168.1.1/path" has been removed, but startup automated migration should convert such patterns eventually present in serverClient.	7 years ago
Michael Peter Christen	607b39b427	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Conflicts: htroot/yacysearchitem.java	7 years ago
Michael Peter Christen	4355de0f3c	(more!) evaluation of XRealIP from nginx reverse proxy	7 years ago
luccioman	f9cba827c0	Made "tld:" modifier case insensitive and IDN complient. Thus allowing typing internationalized top-level domains with non ASCII characters as tld: modifier.	7 years ago
luccioman	c5c3cc1274	Use HTTP Post operation for resetting memory monitoring state. Fixes issue #145 Also added textual hint on the button, and display it only when it makes sense, that is to say when the memory state is 'exhausted'.	7 years ago
luccioman	cb10daba92	Renamed Chinese & Greek lng files using ISO639-1 codes. Previously named with their ISO 3166-1 country code : this way, when setting language to "Browser" in ConfigBasic.html, it didn't work properly when browser preferred language was Chinese or Greek as their respective language codes are "zh" and "el" (not "cn" and "gr" which are their country codes)	7 years ago
luccioman	4b61edff32	Added a help link to ISO 639-1 language codes list ref	7 years ago
luccioman	a994d439af	Added description of spatial restrictions in search options	7 years ago
luccioman	8a48f80909	Added language HTML attribute to the search home page.	7 years ago
luccioman	5ff76fdcb9	Fixed spelling	7 years ago

1 2 3 4 5 ...

6022 Commits (6a1e259fd09f3e0e87e04b486a9538df0a90b163)