yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
reger	28eae57e8b	spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem)	11 years ago
reger	280c4a3ac1	exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12. Encountered: <EOF> after : "" at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440) at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464) at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181) at suggest.respond(suggest.java:73)	11 years ago
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	11 years ago
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	11 years ago
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	11 years ago
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	11 years ago
Michael Peter Christen	9bd71fdbb4	made the access tracker class static because it shall be used by the jetty auth module	11 years ago
Michael Peter Christen	7d6fc79eb8	refactoring (usage of constant names for attributes of authentication check)	11 years ago
Michael Peter Christen	b9d36e45e0	removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process.	11 years ago
reger	e9081c0f17	moved startup execAPIActions call after Jetty startup execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, but it's more robust to place the call after http is assigned to switchboard/serverSwitch.	11 years ago
orbiter	dcf46ce8f6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	343d2ef49a	new data type for access tracker (unfinished)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago
reger	abbf487023	fix QueryGoal Image query (missing space) see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..	11 years ago
reger	26e9d7e066	fix NPE in IndexControlRWIs_p.html - metatags my be null Caused by: java.lang.NullPointerException at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445) at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400) at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345) at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334) at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176) at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641) at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)	11 years ago
reger	7f9b9315fe	Merge origin/master	11 years ago
reger	8eaabb9600	remove dependency from old serverCore.java - remaining getPortNr not needed (as current release allows only to set plain integer as port, see ConfigBasic)	11 years ago
orbiter	3961b643a3	write solr searches to search log	11 years ago
orbiter	15882beb19	fix for strange NPE java.lang.NullPointerException at net.yacy.search.Switchboard.updateMySeed(Switchboard.java:3667) at net.yacy.peers.Network.peerPing(Network.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)	11 years ago
orbiter	f3ac923a7e	ftp client shall be able to open non-anonymous ftp servers if login details are given	11 years ago
Michael Peter Christen	ee17bd0b69	added option to attach remote solr servers in read-only mode	11 years ago
Michael Peter Christen	25f9c35033	add patch which shall prevent that naive search mistakes like usage of regular expressions cause no results. Usage of '*' followed by a dot or any expression will now cause that this expression is used as a filetype search.	11 years ago
reger	71cac1a278	added SSL/HTTPS connector to support SSL/https connection on port 8443 !!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used - added ping test for above to migration - as of now port for https is hardcoded to default 8443 - if not urgend required I'd leave it this way (it's standard) to use different ports for http and https - post https port on ConfigBasic.html (if active)	11 years ago
Michael Peter Christen	82c0525e71	wrong logger fix	11 years ago
Michael Peter Christen	25250405f1	solr servlet preparation for join with jetty branch	11 years ago
Michael Peter Christen	2f16770681	migrated to solr 4.6.0	11 years ago
orbiter	937273d4e3	added parsing of metadata to surrogate reading: a dublin core record inside of surrogate input files may now contain tokens within the namespace 'md' (short for: metadata). The token names must be valid withing the namespace of the solr field names. All md-tokens inside of surrogate files then overwrite values within solr documents before they are written to the solr index. This makes it possible to assign collection names to each surrogate entry and also ranking information can be added. Please see the example file.	11 years ago
Michael Peter Christen	2702d9e56b	- added a SolrQueryResponse2SolrDocumentList method which is able to work around the unfolding process in Solr's BinaryResponseWriter. This was a huge performance bottleneck in the embedded solr connector and the problem is actually on Solr side, but we have now a workaround. - This made it possible to abstract a high-performance index access method which is implemented as method getDocumentListByParams. That method is also implemented in the SolrServerConnector and provides a very efficient access to a solr index if the index is embedded. - a popular use of the document list retrieval is a result count which can now also make use of the new method, via getDocumentCountByParams. - enhanced the Error cache which now does not store error documents within the ram cache if the document is also written to solr. When documents are retrieved from the cache, they are partly read from the ram cache and if not existent there, from the Solr index.	11 years ago
Michael Peter Christen	552ef9f18e	fix for bad ErrorCache.exists test (bug from latest commit)	11 years ago
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	11 years ago
Michael Peter Christen	303f5694ba	avoid usage of existsByQuery. If a document can be loaded by the ID before testing other fields from the existsByQuery request, then a document cache fills and queries after that one can be avoided.	11 years ago
Michael Peter Christen	78eac85161	better calibration of caches and queue maximum sizes	11 years ago
Michael Peter Christen	c8af19bd37	removed unnecessary check which causes a NPE when searching with empty search string	11 years ago
Michael Peter Christen	e3c2f09de9	- reduce computation in case that specific postprocessing fields are not selected - de-select citation rank computation	11 years ago
Michael Peter Christen	cfa08024c7	removed optimization bevore postprocessing because that may cause a time-out which will cause that postprocessing fails.	11 years ago
Michael Peter Christen	6f3a923691	fixed urlmask which was not able to combine several constraints	11 years ago
Michael Peter Christen	a125904a1c	fixed a NPE in surrogat processing	11 years ago
Michael Peter Christen	0db8e34625	enhanced webgraph processing	11 years ago
Michael Peter Christen	a16534cb0a	tried to fix timeout and connection-lost problems when using an outside solr.	11 years ago
Michael Peter Christen	c3dcbdc8d5	try to recover from an OOM during citation index reading and fail-over to second solr core in case of unrecoverable OOM.	11 years ago
Michael Peter Christen	9932c441c8	fixed a problem with Date fields parsing Solr results if a remote Solr is attached.	11 years ago
Michael Peter Christen	ae55d69ef6	include/exclude size NPE fix (recently added)	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
orbiter	037cd0a57c	using the BinaryResponseWriter which is supported within the YaCy solr servlet since YaCy 1.63. This is much more performant for the client than using the XMLResponseWriter because parsing of XML data is very CPU intensive. Older YaCy peers are still requested using the XMLResponseWriter but the majority of YaCy peers already respond with the binary writer. This makes remote searches much faster and less CPU intensive.	11 years ago
orbiter	61409788eb	less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form.	11 years ago
reger	f23471c471	add check to prevent index entries containing url_file_ext_s with ";jsession=xyz" note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication)	11 years ago
orbiter	3e552550d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c2d720cdaf	purge a lucene cache - possible memory leak fix	11 years ago

1 2 3 4 5 ...

767 Commits (a9ed28c0b50d1ccd0216d4598bb2157ea74fdf66)