yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	d359d521a1	fixed warc importer The importer tried to import a gziped files as plain warc. It will now check the file extension and use a unzip automatically on-the-fly.	4 years ago
Michael Peter Christen	cef5fde343	adding message to UI to make port change transparent	4 years ago
Michael Peter Christen	22841ffbf1	creating a threaddump during every cleanup process to be able to find out what a peer did (not) last time before a crash	4 years ago
Michael Peter Christen	d7b2d82faa	showing MB instead of KB in PerformanceMemory	4 years ago
sgaebel	3431f91db9	removes unused 'unused' tokens	4 years ago
sgaebel	dd9d4b1188	replace org.junit.Assert.assertThat by org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid deprecation-warning	4 years ago
sgaebel	df9ea0a42a	removes some warnings: unused imports, params	4 years ago
sgaebel	80785b785e	adds deleting during recrawl	4 years ago
Michael Peter Christen	e0ad8ca9da	replaced json library from JSON.org with libandroid-json-java This fixes https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	6d7dc01670	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	5 years ago
Michael Peter Christen	0a7bda2a21	removed JSON-evil license line These classes had been my own creative work. Just the copyright line had been appeared possibly due to a bad copy-paste activity, unaware that the line is a non-free addition.	5 years ago
Michael Christen	57484eb1cc	xss protection	5 years ago
Michael Peter Christen	37827b6788	removed doubes from getpageinfo	5 years ago
Michael Peter Christen	f03e16d3df	enhanced crawl start url check experience urls are now urlencoded and a check is also performed in case that an url is copied into the url field using copy-paste	5 years ago
Michael Christen	41f9b8517f	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	5 years ago
Michael Christen	4ccd1ea3c0	new servlet path "p2p" with a test class. Call the class with http://localhost:8090/p2p/seeds.json	5 years ago
Michael Peter Christen	f7c97fd99e	scanner crawl starts wants non-parseable files	5 years ago
Michael Peter Christen	a20b61f5c0	fix for bad json	5 years ago
Michael Peter Christen	d62a8ec542	masking connects	5 years ago
Michael Peter Christen	5eb0033aef	typo	5 years ago
Michael Peter Christen	2c0742fc43	added json version of peer list	5 years ago
Michael Christen	cfa27d2fd5	fixed links	5 years ago
Michael Peter Christen	0bddf2d895	switched url and snippet position	5 years ago
Michael Peter Christen	2999f4b985	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	5 years ago
Michael Peter Christen	449780f762	enhanced search result design	5 years ago
Michael Christen	cdc7adedc2	added sponsor link	5 years ago
Michael Christen	f2d45ebb87	design updates + added link to new forum	5 years ago
Michael Peter Christen	789670bd8c	design changes - more space	5 years ago
Michael Christen	3a46b07603	fixed many links to old forum, now https://searchlab.eu	5 years ago
luccioman	6b45cd5799	New optional crawl filter on the URL a doc must match to crawl its links For finer control over which parsed documents can trigger an addition of their links to the crawl stack, complementary to the existing crawl depth parameter.	6 years ago
luccioman	d16bc99835	Added "Show Metadata" links to the ViewFile.html links mode To conveniently follow parsed links in the file viewer	6 years ago
luccioman	8c068a9c99	Better HTML text semantics for technical descriptions	6 years ago
luccioman	a5771b1f14	Made SNI extension user configurable without the need for server restart TLS Server Name Indication (SNI) extension activation can now be configured with the new Settings_p.html?page=httpClient administration page. SNI extension is also now enabled by default, as in 2019 the unrecognized_name(112) alert is more properly handled by major web servers TLS implementations, following the RFC 6066 standard. Related YaCy issues : #153 #189 and #272 JDK 1.7 bug : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374 Apache httpd issue : https://bz.apache.org/bugzilla/show_bug.cgi?id=56241 RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3	6 years ago
luccioman	42c8a251c8	Render a relevant message and status on blocked search requests When unauthenticated (or with insufficient rights) client is blocked either because blacklisted or excessive request rate, render an error message and a relevant HTTP status for API requests, instead of an empty response that appears broken.	6 years ago
luccioman	a8316c79da	Allow JS resorting of search results by unauthenticated users Acces rate limitations to this search mode by unauthenticated users are set low by default to prevent unwanted server overload but can be customized through the SearchAccessRate_p.html configuration page Fixes #291	6 years ago
luccioman	0ab2b49c31	Made /yacysearch access rate limitations user configurable With a new admin page at /SearchAccessRate_p.html in menu Network Access > Local Search > Access Rate Limitations	6 years ago
luccioman	630fa0015a	P2P/Privacy switch buttons support with JavaScript disabled	6 years ago
luccioman	74fd2f30fa	Support for search result switch buttons with JavaScript disabled	6 years ago
luccioman	ebc583cdb2	Properly render the href attribute of the active page button	6 years ago
luccioman	093ea9586c	Properly fill current page number to new server side pagination template When current page is automatically reset to zero because of a new search request.	6 years ago
luccioman	6e9d5f60ad	Server side initial pagination links rendering For better support of the search page usage with JavaScript disabled. Reduces also the number of initial refreshes of the paginations links. When JavaScript is enabled, pagination links are still regularly refreshed until all the search feeds are terminated on server side.	6 years ago
luccioman	4b9cc4746d	Upgraded Bootstrap dependency from v3.3.7 to v3.4.1 Non regressions tested on the following platforms : Linux Debian Stretch : - Firefox 60.5.1esr - Chromium 72.0.3626.96 Windows 10 : - Firefox 65.0.1 - Chrome 72.0.3626.109 - Edge 25.10586.672.0 - IE 11.1540.10586.0 Mac OS : - Safari 11.0	6 years ago
luccioman	c617ea58a0	Render additional embedded audios from links on extended audio search	6 years ago
luccioman	69f1971052	Added basic controls to play all audio results. Not displayed when JavaScript is disabled.	6 years ago
luccioman	9782a98a9c	Added the possibility to customize facets sort type and direction Previously search navigators/facets elements were sorted only by counts. Now from the ConfigSearchPage_p.html admin page, sort direction (ascending/descending) and type (on counts or labels) can be customized independently for each navigator.	6 years ago
sgaebel	c2398fd890	remove warnings: 'Statement unnecessarily nested within else clause'	6 years ago
sgaebel	8d2e7262d9	Recrawl: - set the chunksize to 100 to meet the max of the embedded solr - re-enable sorting (the case where we switched it of should be away) - enable recrawling on remote-solr	6 years ago
luccioman	60b520fb13	Cleaned up Spanish translation after merge of PR #238 * Fixed some indentation * Removed untranslated entries	6 years ago
luccioman	cd72515188	Merge pull request #238 from ivanhercaz/esLang [WIP] Spanish translation	6 years ago
luccioman	2f75e2d9c8	Fixed a case of NullPointerException on disconnected RWI data structure	6 years ago
luccioman	e85f231bdf	Fixed termination of Host browser and link structure Solr query threads On some conditions (especially when reaching timeout), concurrent Solr query tasks used by the /HostBrowser.html and /api/linkstructure.json never terminated, thus leaking resources, as reported by @Vort in issue #246	6 years ago
luccioman	260ac11c65	Limit length of initially visible text in link structure graph nodes To improve a bit readability of graphs having a large number of nodes.	6 years ago
luccioman	5a8d9abd8a	Upgraded d3js dependency from 3.4.4 to 5.7.0	6 years ago
luccioman	9f8e1994a4	Added missing CSS width units to some HostBrowser.html styling	6 years ago
luccioman	0b1d2cb0dd	Fixed "TypeError: table.tBodies[0] is undefined" host browser JS error Traced in browser console when a host details table is empty.	6 years ago
luccioman	fcf6b16db4	Added new crawler attribute for finer control over Media Type detection New "Media Type detection" section in the advanced crawl start page allow to choose between : - not loading URLs with unknown or unsupported file extension without checking the actual Media Type (relying Content-Type header for now). This was the old default behavior, faster, but not really accurate. - always cross check URL file extension against the actual Media Type. This lets properly parse URLs ending with an apparently odd file extension, but which have actually a supported Media Type such as text/html. Sample URLs with misleading file extensions added as documentation in the crawl start page. fixes issue #244	6 years ago
luccioman	88d0ed676c	Render http status instead of null responses on snapshot api errors	6 years ago
luccioman	92e10d7d1c	Added a crawl start hint message on availability or not of wkhtmltopdf As this tool is required to produce pdf snapshots	6 years ago
luccioman	8852c97cee	Added basic styling for cleaner rendering of missing image snapshots For the output of the Solr snapshots writer	6 years ago
luccioman	746e0e788d	Render a relevant HTTP status code on snapshot image rendering error Instead of a null response body which is not very helpful.	6 years ago
luccioman	753bda1409	Fixed remaining blacklist entries improper decoding of '+' character In the blacklist cleaner and import/export administration pages.	6 years ago
luccioman	61c337f29a	Decode blacklist entries for easier edition of non ascii chars Not using the JDK URLDecoder.decode() function, as it strips '+' characters when they occur after '?' (both characters having regular expression semantics when used in blacklist path patterns)	6 years ago
luccioman	ed93221fa1	Improved normalization of blacklist path patterns having non ascii chars Normalize blacklist path patterns using percent-encoding, at pattern edition in web interface and at loading from configuration files. Fixes issue #237	6 years ago
luccioman	d23578efc3	Merge pull request #240 from ivanhercaz/fixEnglishBookmarksPage Fix English Bookmarks.html	6 years ago
ivanhercaz	41684ba559	adding Spanish to the interface language list	6 years ago
ivanhercaz	1dafc85d33	typo fix in Bookmarks.html	6 years ago
luccioman	3d14fb51c5	Removed now unused Java import in addition to modification from PR #239	6 years ago
otter	8820d8d7c7	replace current date by FailDate	6 years ago
luccioman	c409ec089c	Hide password values from visible HTML in the Advanced Config page Fixes issue #228	6 years ago
luccioman	75b9cd53cc	Use accessible labels in the Server Access Settings page	6 years ago
luccioman	4ed055bcdf	Enforced access controls to System settings pages	6 years ago
luccioman	de6820d257	Updated html input field type for seed upload with file method - To meet current browsers security rules, which prevent selecting a full file path with an html input field of type 'file' - As it does not make sense to select a local file path when a the administered YaCy server is remote (not on the same computer as the browser)	6 years ago
luccioman	10548229af	Fixed rendering of the YMarks.html page Also to clarify which pages still depends on old JQuery and JQuery UI dependencies.	6 years ago
luccioman	bdd6ec3fff	Fetch result pages one by one when scrolling in portal search widget To prevent unnecessary load and items retrival errors on backend	6 years ago
luccioman	b46dc4fc94	Fixed portal search widget results favicon url	6 years ago
luccioman	fa96637a84	Configured local peer as default portal search widget backend Rather than relying on a peer eventually deployed on search.yacy.net	6 years ago
luccioman	44efb2f868	Removed implicit global JavaScript variables from portal search widget	6 years ago
luccioman	79643c40bf	Limit search API calls rate when typing in the search portal widget	6 years ago
luccioman	39dd29a484	Replaced RWI ranking JQuery sliders with standard HTML range inputs Considering that the sliders usage on that page is very basic, using standard HTML5 inputs of type "range" has here the following advantages : - better keyboard accessibility - remove not very necessary additional jquery dependencies Today browsers suport for range inputs is good, and even on old unsupporting browsers such as IE < 10 they nicely fall back to text inputs.	6 years ago
luccioman	1b866c6076	Added possibility to hide or show image results with rendering errors When searching images, thumbnails that could not be rendered (because of a load error such as HTTP 404, networking issue or an internal error on the rendering servlet) are now hidden as default. But can be revealed with a button if desired. Fix for issue #217	6 years ago
luccioman	5b60b4225f	Fixed encoding of '+' character on search pages links As revealed by issue #216	6 years ago
luccioman	b726b2b532	Removed unnecessary '+' character URL decoding from search query Manually replacing '+' character or "%20" by a space character in the search query parameter was necessary in YaCy a long time ago to properly decode application/x-www-form-urlencoded format (commit `9842fab6e4` in 2010). Since the introduction of Jetty as the embedded HTTP server (commit `4b77733e59` in 2013), this is no more necessary as Jetty internals already do this for us in org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(). So we can remove now this duplicated decoding as it prevents a proper use of the '+' character in search requests, as reported in issue #216.	6 years ago
luccioman	0efc6c89ef	Fixed rendering of crawl queues page for URLs with raw IPV6 addresses	6 years ago
luccioman	0e976e9030	Added a link to MediaWiki dumps summary in import page for convenience	6 years ago
luccioman	ecd4535eb6	Prevent entering empty OpenSearch URLs in ConfigHeuristics_p.html In order to early prevent adding invalid configuration entries to the heuristicopensearch.conf file, as revealed the issue #209.	6 years ago
luccioman	1ca9cb6bd9	Fixed a NullPointerException case, reported in issue #209	6 years ago
luccioman	8a29551c54	Upgraded the OpenGeoDB dump URL The status of the library in the DictionaryLoader_p.html page now also advertises the user that an upgrade can be applied when an older dump is already loaded. Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter chat.	6 years ago
luccioman	bf4f320b16	Optionally render the response header when using the Solr html writer With params rendered as html input fields for conveniently modifying params values and refreshing results.	6 years ago
luccioman	88e6ce23c9	Consistently render empty facets and facets having only entries at zero	6 years ago
luccioman	534f09e92b	Added and updated hint messages about remote crawler status To help identify why remote crawl results may not be received.	6 years ago
luccioman	c726154a59	Fixed removal of URLs from the delegatedURL remote crawl stack URLs were removed from the stack using their hash as a bytes array, whereas the hash is stored in the stack as String instance.	6 years ago
luccioman	2bdd71de60	Added server side columns sorting on the Process Scheduler table For easier usage of large tables in the Table_API_p.html page.	6 years ago
luccioman	f895745e1c	Removed more unsafe concurrent accesses to SimpleDateFormat instances. SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	5c6c61809a	Fixed JavaScript sorting of tables with cells containing an input field	6 years ago
luccioman	3885fd64a0	Fixed Table_API_p.html current table page loss on row editing. Reset only to the first table page when the search query is modified	6 years ago
luccioman	e97580dfc7	Fixed unsafe conccurent access to generic SimpleDateFormat instances SimpleDateFormat must not be used by concurrent threads without synchronization for parsing or formating dates as it is not thread-safe (internally holds a calendar instance that is not synchronized). Prefer now DateTimeFormatter when possible as it is thread-safe without concurrent access performance bottleneck (does not internally use synchronization locks).	6 years ago
luccioman	38a3a5e5ad	Fixed a NullPointerException case in the suggest api	6 years ago
luccioman	b159564c72	Properly render json string attributes in the crawl profile html editor	7 years ago
luccioman	cced94298a	Added a new crawler document filter type using Solr syntax This makes possbile to set up much more advanced document crawl filters, by filtering on one or more document indexed fields before inserting in the index.	7 years ago
Michael Christen	e0dc632020	removed transformer it was not used any more	7 years ago
luccioman	eb94986f95	Added Italian in available web interface languages list	7 years ago
luccioman	9bc7b6c39d	Allow edtion of scheduled next execution dates for finer control Can be useful more especially when scheduling many API calls over a long period of time to precisely adjust each scheduled date/time.	7 years ago
luccioman	b5dc1f376f	Made outgoing pools max total connections user configurable For a finer control over the maximum simultaneously active outgoing connections.	7 years ago
luccioman	387d646c0e	Added gzip compression of responses returned to user-agents accepting it Enabled as default, but can be disabled using the "Server Access Settings" admin page.	7 years ago
luccioman	a1990202ab	Fixed unresolve-pattern case on old html title	7 years ago
luccioman	35826a3091	Added a search page customization setting to display or not favicons If not interested in displaying this on your search results and notably on a peer with limited resources this can help saving some CPU and outgoing network connections.	7 years ago
luccioman	79bd9f623a	Updated YaCy home page embedded links from http to https scheme	7 years ago
luccioman	1dfd3e9dde	Limit the rate of calls to the suggest API when typing in search field	7 years ago
luccioman	4f0ab318ef	Fixed snippets statistics displayed "provided by Solr" count	7 years ago
luccioman	e115e57cc7	Reduced text snippet extraction processing time. By not generating MD5 hashes on all words of indexed texts, processing time is reduced by 30 to 50% on indexed documents with more than 1Mbytes of plain text.	7 years ago
luccioman	ce289ebaf7	Upgraded ConfigNetwork_p html doctype and added language attribute	7 years ago
luccioman	16254fac1e	Removed unpaired select closing tag	7 years ago
luccioman	692c1cfdde	Added a UI section to configure encryption of peers communications	7 years ago
luccioman	e67df103b5	Removed more remaining uses of deprecated Seed.getIP() function.	7 years ago
luccioman	addd18c993	Removed some remaining uses of deprecated Seed.getIP()	7 years ago
luccioman	c35d0568b6	Support for preferred https in peers communication on more operations	7 years ago
luccioman	0a058ba6af	Keep https in result message URL when push_p API is requested over https	7 years ago
luccioman	8bc36506f2	Enforced access controls on basic administration settings pages. Ensuring http post method is used for operations with server-side effects (in respect of http semantics), and a valid transaction token is provided by the user-agent.	7 years ago
luccioman	a3ec7a7a5f	Added analysis optional setting to compute statistics on text snippets Thus producing some basic stats on processing times for snippets generation and counts on snippets per source type.	7 years ago
luccioman	72808655a5	Added controls on mode switch when attached to remote Solr instance(s) - to prevent unwanted exposure of index entries about private local/intranet documents when switching from "Intranet Indexing" mode while attached to remote Solr instance(s) - to warn user about remote Solr instance(s) still attached when switching from modes other than "Intranet Indexing"	7 years ago
luccioman	2af3bf79c7	Improve rendering of remote Solr admin URLs - properly handle IPv6 loopback address replacement - replace loopback address or host only when accessing peer remotely - replace loopback part with the peer hostname as requested rather than with its seed public IP as this works better for Intranet mode and when peer is behind a reverse proxy.	7 years ago
luccioman	0d34034f17	Ensure an embedded Solr is available for Solr dump/restore operations Otherwise, these operations triggered NullPointerException when only an external Solr index is attached.	7 years ago
luccioman	d92b191942	Ensure no remote Solr is attached before "Shut Down and Re-Start Solr" Otherwise once this operation is applied, the remote Solr(s) instances are deconnected and the embedded Solr is connected even if disabled by setting "core.service.fulltext". Also use constants for related default setting values.	7 years ago
luccioman	69690c13a0	Optionally allow external Solr server with self-signed certificate This is necessary when you want to attach to a dedicated external Solr server protected with basic http authentication and requested over https but having only a self-signed certificate.	7 years ago
luccioman	211f3d04ab	Added hint message inciting to check accounts settings on fresh install When unrestricted access from localhost is set and the accounts config page has not been visited at all.	7 years ago
luccioman	2fd4d05e2f	Added a shared Java constant for setting key server.servlets.called	7 years ago
luccioman	033f7c4c00	Adjusted localhost/qualified account admin access informational texts. Following remarks from @etam on issue #170	7 years ago
luccioman	05702c2ced	Adjusted api table query matching strategies When inlined (for example in the CrawlProfileEditor_p.html page) : search only on the comment, as the url is not visible On regular display : search on comment OR url, instead of comment AND url. Otherwise searching on comments terms is almost useless as these terms are not necessarily present in the url.	7 years ago
luccioman	65451a3d62	Fixed start record on the last api table results page When the last results page size was lower than maximumRecords, results from the previous page where displayed again.	7 years ago
luccioman	86c902b853	Enable api table page navigation with search query Applied the same default results page size as when a type filter is defined for proper and consistend page navigation when combining type filter and search query.	7 years ago
luccioman	9c7faa04d8	Display the total number of matching items when filtering on table API Notably for a proper page navigation of the crawl scheduler table (CrawlProfileEditor_p.html page).	7 years ago
luccioman	311e91ff77	Added hint to clarify results rendered dates and 'Sort by date' switch	7 years ago
luccioman	90dc580158	Fixed initial ViewFile mode and suggestions links from previous commit	7 years ago
luccioman	0b6aed4de6	Keep the selected view mode when typing a new URL in the ViewFile page Otherwise, when interested in viewing `Link List` for example, each time you typed a new URL, `Parsed Sentences` view mode was selected as default and you had to selected again the view mode you are insterested in.	7 years ago
luccioman	db55eaa673	Updated link to Solr Function Queries documentation page	7 years ago
luccioman	7496df93c3	Fixed error 414 (URI Too Long) when manually selecting to many RSS items Switched form method to HTTP POST to prevent this.	7 years ago
luccioman	fb3032c530	Added a crawl filtering possibility on documents Media Type (MIME)	7 years ago
luccioman	90d4802082	Updated link URL to IANA Media Types with https	7 years ago
luccioman	e45afedee4	Added support for enclosures (media links) to the RSS loader	7 years ago
luccioman	aaefd5219c	Reduce log verbosity of RSS loader on feed items with no link	7 years ago
Michael Peter Christen	187075b878	added nav filter	7 years ago
luccioman	07e8628853	Added HTML5 embedded audio for results playing on supporting browsers Restricted to authenticated or localhost users only to prevent redistribution license issues.	7 years ago
luccioman	46c9da6428	Allow creation of vocabularies from remote CSV file URLs.	7 years ago
luccioman	348d07a999	Enforced controls on vocabulary editing operations.	7 years ago
luccioman	2532db2ce6	Vocabulary editor : use accessible labels and CSS for elements position	7 years ago
luccioman	ac14437316	Vocabulary_p.html : richer semantics for HTML tables Also replaced deprecated attributes	7 years ago
luccioman	b67742336e	Provide user interface messages on vocabulary creation read/write errors	7 years ago
luccioman	ea57763294	Mark vocabulary name field as required using html instead of JavaScript	7 years ago
luccioman	39ec8cba37	Fixed Vocabulary_p.html HTML validation errors. Validated with Validated with Nu Html Checker 17.11.1.	7 years ago
luccioman	7c644090ff	Fixed CrawlStartExpert.html HTML validation errors Validated with Nu Html Checker 17.11.1	7 years ago

1 2 3 4 5 ...

6077 Commits (d097a642c290c484a7bf5455805f1fe3e623ae67)