yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	11 years ago
Michael Peter Christen	790f103f32	delete fail-docs during postprocessing to prevent that they will appear again and stay in postprocessing forever.	11 years ago
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	11 years ago
Michael Peter Christen	bf97e38b83	removed clearURLIndex, which is a stub remaining from the old metadata database and not needed any more	11 years ago
Michael Peter Christen	bc28247089	Added methods in resource observer to calculate the available and the occupied disc space. These values are also shown on the status page. The disc space calculation shall be used for a disk-limitation of the search index.	11 years ago
Michael Peter Christen	ca8b100f96	run the cleanup process even when load is high, do postprocessing even if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB RAM available). The memory amount of the postprocessing is the cause that systems block because they run into a frequent-GC chain which almost locks the peer. If running with enough memory, the postprocessing is fast and not damaging to the system. Because the required RAM of 0.5 GB is never available in default setting, the postprocessing will not run if the peer is not reconfigured to use more memory.	11 years ago
Michael Peter Christen	195e5868d3	catch solr close exceptions	11 years ago
Michael Peter Christen	751c128544	extra sleep for remote searches enhances search results because there is more time for more remote peers to contribute on the first result page	11 years ago
Michael Peter Christen	0cabcbbe83	more efficient wordcount	11 years ago
Michael Peter Christen	3d474a843e	added memory protection for postprocessing	11 years ago
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	11 years ago
Michael Peter Christen	9228214f9b	enrichment of PerformanceMemory display of SolrInfoMBean table	11 years ago
Michael Peter Christen	e8bdf16ea7	added statistic information for solr resources in PerformanceMemory	11 years ago
Michael Peter Christen	931541d198	re-inserted default value re-set button to performance queues and patched missing values for recent new queues	11 years ago
Michael Peter Christen	456e52e0d5	enhanced strategy to clear solr caches - redesigned the instance mirror class (which was a mess) - added final method to close a searcher (which otherwise keeps a cache) - changed cache clear method which iterates over resources and calls clear to all caches in the searcher resources	11 years ago
orbiter	22e3524797	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c40ba51ca6	added new suggest method which replaces more-than-one suggestions: instead of computing suggest permutations of the given words, the completion of a phrase using the given words is searched in the fulltext index.	11 years ago
reger	b693ce9759	allow combining selection of different search nav's (facets) - selecting more than one nav combines the 2 selections (with AND) - unselecting one nav clears all selected (e.g. select filetype:pdf and /language/fr shows ~ french pdf's only)	11 years ago
reger	cb71413d19	fix page nav, to keeping modifier (was new issue)	11 years ago
orbiter	416481c33e	added a boost on appearance of combined words (in the same order the user submitted that) when searching for more than one word	11 years ago
reger	9b24dae2b7	add language navigation filter clause to rwi results	11 years ago
reger	f307d65dcf	prepare for a language navigator works fine to restrict language for local solrSearches. More work needs to be done to make rwi/remote searches respect the modifier.language restriction.	11 years ago
Michael Peter Christen	c84bcc878a	first try to add a generic solr servlet as luke request servlet	11 years ago
Michael Peter Christen	8b14e92ba4	added button in host browser to re-load 404/failed documents	11 years ago
orbiter	5ec0c969c9	fix for http://bugs.yacy.net/view.php?id=354	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
Michael Peter Christen	489c3fbc90	code simplifications / removed warnings	11 years ago
Michael Peter Christen	0168f80c28	new crawling factors can now be changed during runtime	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	11 years ago
Michael Peter Christen	0d235a565b	cleanup crawl loader jobs	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	11 years ago
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
reger	28eae57e8b	spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem)	11 years ago
reger	280c4a3ac1	exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12. Encountered: <EOF> after : "" at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440) at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464) at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181) at suggest.respond(suggest.java:73)	11 years ago
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	11 years ago
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	11 years ago
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	11 years ago
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	11 years ago
Michael Peter Christen	9bd71fdbb4	made the access tracker class static because it shall be used by the jetty auth module	11 years ago
Michael Peter Christen	7d6fc79eb8	refactoring (usage of constant names for attributes of authentication check)	11 years ago
Michael Peter Christen	b9d36e45e0	removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process.	11 years ago
reger	e9081c0f17	moved startup execAPIActions call after Jetty startup execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, but it's more robust to place the call after http is assigned to switchboard/serverSwitch.	11 years ago
orbiter	dcf46ce8f6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	343d2ef49a	new data type for access tracker (unfinished)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago

1 2 3 4 5 ...

802 Commits (28a7b42e6bacff5072b2e1f5df8a13dc5e7bc184)