yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Christen	867f96a32b	removed warnings	2 years ago
Michael Christen	8a06beaf24	removed finalize() methods, deprecated	2 years ago
Michael Peter Christen	60c9986a0e	new release file names with date and git hash ...without reference to 9000ish SVN	2 years ago
Michael Christen	8b37a5dc6f	removed log4j properties because we don't have a log4j any more	2 years ago
Michael Christen	347b676b76	changed system to load build properties	2 years ago
Michael Christen	c36bdbf78d	refactoring	2 years ago
Michael Peter Christen	1e1107c97c	clean-up and new servlet method caching	2 years ago
Michael Peter Christen	adbda4c71b	moved all remaining servlet classes to new location	2 years ago
Michael Peter Christen	33889b4501	moved more servlets to new location	2 years ago
Michael Peter Christen	6d388bb7bf	refactoring - moved htroot/yacy classes	2 years ago
Michael Peter Christen	48fcf3b3b5	alternative servlet method, tested with wiki may become the future method to store servlets	2 years ago
Michael Peter Christen	d23dea2642	refactoring	2 years ago
Michael Peter Christen	23f1dc3741	addressing/fixing some concurrency issues from https://github.com/yacy/yacy_search_server/issues/505	2 years ago
Michael Peter Christen	9c1bc533fa	removed hazelcast because it is phoning home, see also: https://github.com/yacy/yacy_search_server/issues/504	2 years ago
Michael Peter Christen	fc98ca7a9c	removed ContentControl servlet and functinality This was not used at all (as I know) and was blocking a smooth integration of ivy in the context of an existing JSON parser.	2 years ago
Thomas Koch	3116713672	rm buildDate from build.xml and its usages The https://reproducible-builds.org project invests a lot of work to make builds reproducible. This is a security property. It allows to compare the build of binaries from different builder machines. If they are identical, it means that either the builds have not been manipulated or an attacker managed to attack all builder machines in exactly the same way. One problem that the reproducible-builds project often sees is that projects include the build time in their binaries. This makes builds unreproducible for apparently no reason. The build date should not be of interest since binaries built on different dates but from the same source code should not be different. Thus I decided to remove the build date instead of re-implementing the functionality without the GitRev task. Anyways the reported date was not the build date but the date of the last git commit which is even less informative. The git commit ID would have information value but should only be relevant for "nightly builds".	2 years ago
Thomas Koch	572558244a	rm unused build properties PKGMANAGER, RESTARTCMD, DESTDIR PKGMANAGER is always false, thus the java code wrapped in if statements for this property is dead code and can also be removed. The Debian packaging removed in `c4659f0fb0` did set the PKGMANAGER property to true. When we do distro packages again, we can revisit this commit and redo it with property files instead. RESTARTCMD is only used inside those dead code. DESTDIR is never used even in the build.xml	2 years ago
Michael Peter Christen	3d138d3fdd	catch error when initializing hazelcast should fix https://github.com/yacy/yacy_search_server/issues/468	2 years ago
Burkhard	a6a9828181	Merge pull request #440 from lfuelling/master Add setting for public facing port	3 years ago
reger24	141e86964e	Fix compile deprecation warning warning: [removal] AccessControlException in java.security has been deprecated and marked for removal	3 years ago
reger24	a7e93d9328	Add option to add host to default blacklist from search result - added authorized ikon/button to blacklist a host - host is added to default blacklist - insired by https://github.com/yacy/yacy_search_server/issues/213#issuecomment-412485190	3 years ago
reger24	027e284ef9	Enhance notability of current blacklist by diff color in header in servlet Blacklist_p.html bugfix for `18dddb74c9`	3 years ago
reger24	18dddb74c9	Harmonize loading/reading blacklist between init and servlet to use the same procedures -added BlacklistHelper.blacklistToSortedArray to simplify use in servlet	3 years ago
reger24	f28d705cd0	update IndexBroser_p add to blacklist button add feedback to user on success	3 years ago
Michael Peter Christen	52fe2ed8ba	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	3 years ago
Michael Peter Christen	39e7bbac13	removed deprecation warning for new Double()	3 years ago
reger24	6a5f0b3684	Servlet IndexBroser_p add button "Add to blacklist" allows to add the displayed host to add to the default blacklist	3 years ago
Lukas Fülling	111cf48642	add missing prop	3 years ago
reger24	f33e0ed7fd	revert commit `17fd1a4616` wrong file selected	3 years ago
unknown	17fd1a4616	delete .idea not needed in distribution .idea is created locally by IntelliJ IDEA upon import as gradle project to store IDEA specific settings. No need to include in distribution	3 years ago
Daleth Darko	3ced06c731	Various javadoc fixes	3 years ago
reger24	6a1e259fd0	Fix NPE in Switchboard . getURL https://github.com/yacy/yacy_search_server/issues/441	3 years ago
reger24	eae16287e9	Added epub (ebook) format to existing zipParser *.epub files are zip files containing xhtml files with content and other artifact files, which the zipParser can already feed to index - extension "epub" - mime "epub+zip"	3 years ago
reger24	3e34f7c596	Import Ant build.xml into Gradle and use old compile of servlets in Gradle to be able to use/reuse Ant targets where task has not been implemented in Gradle build. - use the import to include the compile of htroot as first important task ! it is possible that first build fails an compile of GitRevTask.jar ! ! solution/workaround -> use "ant all" once to compile GitRevTask.jar ! - adjusted build.xml a little - split compile-core into compile-core and compile-htroot to have a target for htroot comp. only - set build-path to reuse Gradles build directory - (fix javadoc failure) - changed the filtered-copy of yacyBuildProperties.java to ! the build path :-( as current (copy,delete,exclude) is complicated and not migration worthy, used simple/straigt forward approach (using a yacyBuildProperties.java.template file as copy source)	3 years ago
reger24	398b105781	Prevent that YaCy always starts with a exception message on none Apple systems Perform try to access com.apple.eio.FileManager only on none Win systems	3 years ago
Lukas Fülling	e8a00007f6	add setting for public facing port	3 years ago
Michael Peter Christen	d7b17d8935	fixed missing thread name revert after balancer waiting	3 years ago
Michael Peter Christen	bd3f2483a1	replaced url and date retrieval by only url retrieval This should prevent that the search index is used for freshnes of the index entry.	3 years ago
Michael Peter Christen	163ba26d90	replaced check for load time method instead of loading the solr document, an index only for the last loading time was created. This prevents that solr has to fetch from its index while the index is created. Excessive re-loading of documents while indexing has shown to produce deadlocks, so this should now be prevented.	3 years ago
Michael Peter Christen	1ead7b85b5	remove compiler warning "warning: [try] explicit call to close() on an auto-closeable resource"	3 years ago
Michael Peter Christen	59777010dc	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	3 years ago
Michael Peter Christen	7898815c41	disabling concurrent logging (maybe temporary)	3 years ago
sgaebel	4bf6954474	uses clientBuilder not HttpClients.custom() to have these inside the Pool too	3 years ago
sgaebel	cdf901270c	always use HTTPClient by 'try with resources' pattern to free up resources	3 years ago
sgaebel	69adaa9f55	makes our HTTPClient closable	3 years ago
sgaebel	fc4275f901	handle all references for client, response, request to be able to close them	3 years ago
sgaebel	e7d3a363f2	refactor to use finish()	3 years ago
sgaebel	4fc876f4a3	revert back to use EntityUtils.consumeQuietly - as it simply closes the underlying stream	3 years ago
sgaebel	4f0392e93e	refactor use of AuthSchemeProvider	3 years ago
sgaebel	b74f337859	removes double setting of UserAgent	3 years ago
sgaebel	965748fefb	some refactoring using try with resources	3 years ago
Michael Peter Christen	552ab7051b	fix for warc importer	3 years ago
Michael Peter Christen	3c86b7b780	attempt to make a Mac Release using gradle This is almost working with many workarounds: - run rm lib/yacycore.jar - run ./gradlew clean build bundleNative - run ant clean all - run again rm lib/yacycore.jar - run ./fixMacBuild.sh The build is then inside build/mac/YaCy.app Right now this works so far but it does not have the correct release number inside. Target is to make this working for Windows releases and to embedd jre entirely.	3 years ago
Michael Peter Christen	999c819e3e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	3 years ago
Michael Peter Christen	fd770e90e2	spike to identify paths for YaCy within mac application bundles	3 years ago
Michael Peter Christen	d19872fd26	making sure that crawl queues are closed correctly to prevent data loss	3 years ago
sgaebel	90507c0fdc	comments out printing query params to std.out	3 years ago
Michael Peter Christen	be0aebad84	fixes https://github.com/yacy/yacy_search_server/issues/424	3 years ago
Michael Peter Christen	63ad8ce6b2	removed ymarks had not been used since a long time	3 years ago
Michael Peter Christen	ef5a71a592	enhanced crawl start response time for very very large crawl start lists	3 years ago
Michael Peter Christen	4cadd557dc	removed synchronization in table creation to avoid possible deadlocks when handling OnDemandOpenFileIndex which happens quite often during wide crawling	3 years ago
admin	9b7668fa58	reduced memory footprint during indexing/crawling	3 years ago
Michael Peter Christen	e6a87e0426	enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader.	3 years ago
Michael Peter Christen	e9c5e78868	replaced new Number(Number) with Number.instanceOf to remove deprecation warnings for Java 9	3 years ago
Michael Peter Christen	9e13d77de4	removed call to class.finalize() because of deprecation in java 9 next: removal of finalize() implementation after testing with assert false	3 years ago
Michael Peter Christen	9ef4503672	fixed some newInstance() warnings .. by adding .getDeclaredConstructor()	3 years ago
Michael Peter Christen	1d41380f0a	better support for mac-specific tray functions in java 9	3 years ago
Michael Peter Christen	e81b770f79	enabled crawl starts with very large sets of start urls i.e. 10MB large url list with approx 0.5 million start points	3 years ago
Michael Peter Christen	c623a3252e	fix for jdk 14 bug	4 years ago
Michael Peter Christen	dbd211a1ad	removed/replaced reflection in memory tool	4 years ago
Michael Peter Christen	1cdb21592b	added hazelcast and some modifications to align legacy YaCy with YaCyGrid	4 years ago
Michael Christen	42ea2a1c6f	Merge pull request #405 from jfhs/jfhs/support-all-html-entities Improve HTML entities support	4 years ago
Michael Christen	b2af745dd6	Merge pull request #404 from lnceballosz/master NGI0 - Updating licensing aspects according REUSE	4 years ago
jfhs	10bddc2c2d	Decode HTML entities in all property values by default	4 years ago
jfhs	2135d259e3	Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities	4 years ago
Michael Peter Christen	8f876a8c72	added concurrency to enhance indexing speed during json surrogate import	4 years ago
Michael Peter Christen	f8cbaeef93	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	a857e3d3d5	fix for json importer	4 years ago
sgaebel	1546232c94	adds ranking for multi document queries only	4 years ago
sgaebel	93b353d22d	does not boost or add fields for zero-row-queries (exists())	4 years ago
sgaebel	f16cd154f7	removes unused imports and variables	4 years ago
sgaebel	c69c462a15	replaces a expensive getLoadTimeURL() by exists() refactors urlExists to getHarvestProcess as that is what it does	4 years ago
sgaebel	a5488ac8f5	uses edismax queries on query counts > 1 only	4 years ago
sgaebel	26223dc25a	replaces getLoadTime() by exists() with a simpler query since solr-8.8.1 getLoadTime() causes a high cpu usage	4 years ago
sgaebel	8e4d014c06	removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the log	4 years ago
Lina Ceballos	a96752f5ab	adding SPDX license and copyright headers	4 years ago
Michael Peter Christen	e18d0ef544	trying to set a higher priority to the process that is involved in index export	4 years ago
Michael Peter Christen	8b4394a6c5	fixes for solr 8.8.1 migration - replace new guava 30 with older 25 because that is the correct dependency for solr 8.8.1. The newer one did actually not work! - index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1 subfolder. The older solr_6_6 index is not touched but also not migrated. The index starts with fresh (empty) content. - Older indexes must be migrated by hand (export/import) so far until a better solution is found. - Large schema adoptions for lucene 8.8.1	4 years ago
Michael Peter Christen	ed9789214e	fixed seed initialization problem	4 years ago
Al Sutton	8ade8b8775	Remove forced clear to match new behaviour in `2da71c2a40`	4 years ago
Al Sutton	09695fc6d3	Update exceptions to match updated API	4 years ago
Al Sutton	69014a701e	Update API Usage	4 years ago
Michael Peter Christen	3da7628117	use environment variables to overwrite configuration variables you can i.e. do: export YACY_PORT=8092 && ./startYACY.sh Just append "YACY_" to uppercase version of environment variables and replace all "." with "_".	4 years ago
Michael Peter Christen	13a2e6dc6e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	0ae8ccf657	Make it possible to set an empty password disabling the authentication protocol completely If you set now an empty password, then the http server will not ask to authentify. This is required for environment where we attach an outside authentification service like keycloak or similar using authentication in an ingress proxy. This change is part of the approach to run YaCy inside of a kubernetes cluster where we do not want individual authentication of peers and want to apply a ingress authentication.	4 years ago
Michael Peter Christen	96592a10cf	added option to set yacy configuration values using environment variables To use that feature, set an environment variable with prefix "yacy." and suffix identical to the yacy configuration attribute name. Additionaly we implemented a way to set a peer name using the setting "network.unit.agent". This can therefore now be used to set a peer name with the java call parameter -Dyacy.network.unit.agent=anonymous The purpose for this feature is the ability to set peer names in mass-deployed kubernetes clusters to the same name to prevent that we are flooding peer name statistics with auto-deployment-generated names.	4 years ago
Michael Peter Christen	198826c362	added network scanner process to discover all YaCy peers in the intranet this will be used to wire YaCy peers in a kubernetes cluster	4 years ago
Michael Peter Christen	d9602e8325	Implemented a new syntax in the template engine to simplify json APIs Added also an example for one of the existing APIs. The problem is the comma separator between objects which must not be there for the last entry in a sequence. The new syntax adds the separator symbol automatically.	4 years ago
Michael Peter Christen	5a7f12a9c1	allow network scans for non-standard http/https ports	4 years ago
sgaebel	b8d264f7ec	fixes logging	4 years ago
Michael Peter Christen	4c920d05b5	removed superfluous lines	4 years ago
Michael Peter Christen	907f121d0c	do not overwrite PW with random PW	4 years ago
Michael Peter Christen	3e6a1e0a49	fixed surrogate process counter	4 years ago
Michael Peter Christen	d3526c52af	fixed a problem in warc importer: do not fail if single WARC entries are faulty	4 years ago
Michael Peter Christen	3078b74e1d	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	4 years ago
Michael Peter Christen	01cc32217f	fixed apicall call method parameters and verification in transaction manager which did not have and exception for localhost/basic authentication	4 years ago
Michael Peter Christen	63f58e4785	enhanced strategy in host browser limit number of fresh hosts in round robin hashes	4 years ago
Michael Peter Christen	9be36800a4	increased redirect depth by one this makes sense if one redirect replaces http with https and another replaces www subdomain by without (and vice versa)	4 years ago
Michael Peter Christen	d0abb0cedb	enabling all crawl profiles in all network modes also: increased default internet crawl speed to 4 urls/s/host	4 years ago
Michael Peter Christen	baad56d83d	beautified default peer names	4 years ago
Michael Peter Christen	43a9f4f574	updated solr 6.6.6 -> 7.7.3 dropped GSA support (GSA API is still in YaCy Grid) The 6.6.6 solr index works without migration also with 7.7.3	4 years ago
Michael Peter Christen	c0d9a3e9a7	turned HostBrowser into a admin-only page, now called IndexBrowser This was required because spiders and bots crawled through this page and created load on the peer without use for the user or the YaCy network.	4 years ago
Michael Peter Christen	d359d521a1	fixed warc importer The importer tried to import a gziped files as plain warc. It will now check the file extension and use a unzip automatically on-the-fly.	4 years ago
Michael Peter Christen	e54ab39958	Going back to basic authentication for console/shell commands This does not affect security because: - it is going to localhost only - only users who have already access to the pw hash can do this - no clear text pw is transmitted because that is not stored anywhere The switch to basic is required because these commands are required in the context of hosting on root servers and docker containers where a password change must be done. But the password shell command was not working without password which made the concept unusable. This deficit made it virtually impossible for root server operators to use YaCy because they had been unable to set up a proper password.	4 years ago
Michael Peter Christen	6271e9122c	javadoc fix	4 years ago
Michael Peter Christen	e0f4e3fd9a	enhanced ability to debug the code	4 years ago
Michael Peter Christen	eea2d71851	prevent creation of auth schema factories every time a servlet is called	4 years ago
Michael Peter Christen	fcc9386ed3	enhanced the (already fast!) png exporter	4 years ago
Michael Peter Christen	4e9b425f98	missing fix for latest commit	4 years ago
Michael Peter Christen	3213d9db37	updated jetty from 9.4.17 to 9.4.35 and fixed a bug in ServerSideIncludes that appeared only in that recent version of jetty	4 years ago
Michael Peter Christen	787fec0658	reduced complexity - removed concurrency in sort	4 years ago
Michael Peter Christen	cef5fde343	adding message to UI to make port change transparent	4 years ago
Michael Peter Christen	52228cb6be	added a gc to cleanup process (once every 10 minutes)	4 years ago
Michael Peter Christen	22841ffbf1	creating a threaddump during every cleanup process to be able to find out what a peer did (not) last time before a crash	4 years ago
Michael Peter Christen	36e616271b	do better documentation on how to set a default password	4 years ago
Michael Peter Christen	df2bf9ef28	try to fix maven build error	4 years ago
Michael Peter Christen	264bab6700	trying to fight the UI unavaiability this path addresses a possible issue with too many open connections to remote peers	4 years ago
Michael Peter Christen	7947baeb49	removed all remaining deprecation warnings	4 years ago
Michael Peter Christen	c0f6d6e11d	removed one deprecation warning for jetty library initializing ssl server port	4 years ago
Michael Peter Christen	133440a7a6	some debug lines	4 years ago
sgaebel	3431f91db9	removes unused 'unused' tokens	4 years ago
sgaebel	fc03c4b4fe	removes some warning and unused objects	4 years ago
sgaebel	4a495df63a	removes some deprecation-warnings	4 years ago
sgaebel	dd9d4b1188	replace org.junit.Assert.assertThat by org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid deprecation-warning	4 years ago
sgaebel	df9ea0a42a	removes some warnings: unused imports, params	4 years ago
sgaebel	9bc2297161	fixes deleting during recrawl	4 years ago
sgaebel	80785b785e	adds deleting during recrawl	4 years ago
Michael Peter Christen	e0ad8ca9da	replaced json library from JSON.org with libandroid-json-java This fixes https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	ea8df27e95	modified org.json.* library to fit into the YaCy environment as drop-in replacement. Also made some fixes and enhancements to the library.	5 years ago
Michael Peter Christen	60dc1241a3	added org.json.* library from https://android.googlesource.com/platform/libcore/+/refs/heads/master/json/src/main/java/org/json as a preparation step for https://github.com/yacy/yacy_search_server/issues/347	5 years ago
Michael Peter Christen	053e54a2c7	grand CORS for json files	5 years ago
Michael Christen	cfa27d2fd5	fixed links	5 years ago
Michael Christen	cb20aa7e54	removed donation message in search result column	5 years ago
Michael Christen	25227676ae	removed some warnings	5 years ago
luccioman	6b45cd5799	New optional crawl filter on the URL a doc must match to crawl its links For finer control over which parsed documents can trigger an addition of their links to the crawl stack, complementary to the existing crawl depth parameter.	6 years ago
luccioman	d16bc99835	Added "Show Metadata" links to the ViewFile.html links mode To conveniently follow parsed links in the file viewer	6 years ago
luccioman	a5771b1f14	Made SNI extension user configurable without the need for server restart TLS Server Name Indication (SNI) extension activation can now be configured with the new Settings_p.html?page=httpClient administration page. SNI extension is also now enabled by default, as in 2019 the unrecognized_name(112) alert is more properly handled by major web servers TLS implementations, following the RFC 6066 standard. Related YaCy issues : #153 #189 and #272 JDK 1.7 bug : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374 Apache httpd issue : https://bz.apache.org/bugzilla/show_bug.cgi?id=56241 RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3	6 years ago
luccioman	e90405b6f0	Support parsing audio URLs without file extension Added also a Junit for the audio tag parser	6 years ago
luccioman	a8316c79da	Allow JS resorting of search results by unauthenticated users Acces rate limitations to this search mode by unauthenticated users are set low by default to prevent unwanted server overload but can be customized through the SearchAccessRate_p.html configuration page Fixes #291	6 years ago
luccioman	0ab2b49c31	Made /yacysearch access rate limitations user configurable With a new admin page at /SearchAccessRate_p.html in menu Network Access > Local Search > Access Rate Limitations	6 years ago

1 2 3 4 5 ...

8967 Commits (89c07f09006d1b32478fab24263920635ac5e92a)