yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	617dd9c97b	- added new input field in index.html - changed progress bar in yacysearch.html - moved pagination navigation to page bottom - moved search term input field to headline	11 years ago
orbiter	7d24bcb98d	added flag to require that all web pages, even such without a "_p" extension require authorization. (default off)	11 years ago
reger	1fe26550a0	remove AugmentedBrowsing_p.html augmented browsing switch (has no function in code, previously used in conjuction with http://reflect.ws)	11 years ago
reger	e972b87a8a	remove AugmentedBrowsingFilters_p.html as none of the settings are used currently config settings frome the page also removed from yacy.init augmentation.reflect augmentation.addDoctype augmentation.reparse interaction.overlayinteraction.enabled	11 years ago
reger	a373fb717d	remove more unused from legacy server.http - triggerOnlineAction not used - useTemplateCache not used	11 years ago
Michael Peter Christen	de8f7994ab	as crawling has a low-cpu demand, we want it to run even if the CPU load is VERY high. This applies also if the CPU load is high because of in-cache crawling; in that case we want to experience a high-CPU load as much as possible	11 years ago
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	11 years ago
Michael Peter Christen	ca8b100f96	run the cleanup process even when load is high, do postprocessing even if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB RAM available). The memory amount of the postprocessing is the cause that systems block because they run into a frequent-GC chain which almost locks the peer. If running with enough memory, the postprocessing is fast and not damaging to the system. Because the required RAM of 0.5 GB is never available in default setting, the postprocessing will not run if the peer is not reconfigured to use more memory.	11 years ago
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	11 years ago
Michael Peter Christen	931541d198	re-inserted default value re-set button to performance queues and patched missing values for recent new queues	11 years ago
reger	a71718a459	add config value for ssl/https port (default=8443) adjust server routines to use config	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	11 years ago
reger	fbdd89e198	Merge origin/master	11 years ago
reger	65a2f3d5e7	tweak Jetty credentials to work with YaCy UserDB - user entry in UserDB with admin right can login to access protected pages - dto. admin user, choosen username is stored in conf (adminAccountUserName=)	11 years ago
Michael Peter Christen	ee17bd0b69	added option to attach remote solr servers in read-only mode	11 years ago
Michael Peter Christen	84167adb49	removed unused anomichttpd code after migration to jetty	11 years ago
reger	effea4bca0	Merge origin/master into jetty Conflicts: source/net/yacy/cora/federate/solr/SolrServlet.java	11 years ago
Michael Peter Christen	a16534cb0a	tried to fix timeout and connection-lost problems when using an outside solr.	11 years ago
reger	f111f30ace	Merge origin/master into jetty	11 years ago
Michael Peter Christen	24a052ecb9	removed debug code for existsByIds	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	899e7e92b0	added debug code	11 years ago
reger	1437c45383	merge rc1/master	11 years ago
Michael Peter Christen	7f768b42d3	we do not need the load-image flag any more since this is now controlled by parser switches	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
Michael Peter Christen	f1bfe64361	integrated startpage to compare_yacy	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
reger	f46c723398	allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking	11 years ago
Michael Peter Christen	820b896146	Replaced the inframe loading from yacy.net for donations with the loading of this iframe from the local host. To make this more flexible, this iframe is loaded once after startup from yacy.net.	11 years ago
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
Michael Peter Christen	69f85265e1	added an option to put image links to the crawl queue and handle these like normal documents. Using this option (by default on at this moment; this might change soon) it is possible to get the exif data into the search index to be used in image search.	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
orbiter	944ae5686c	added donation plea to the about box as default (you can replace this in your peer!)	11 years ago
orbiter	bf0ad04e1b	apply load limitation also to dht-in	11 years ago
orbiter	f50b596e0b	do not run dht ditribution if system load is over 2.5	11 years ago
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	11 years ago
Michael Peter Christen	2716dfc46c	increase crawler speed by reduction if the busysleep time	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
orbiter	7c6ccc426c	set crawlingQ to true by default because most webpages are dynamic and crawlingQ should only be switched off in case of crawler traps	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	856e5c42ae	the line "Web Search by the People, for the People" is more generic for P2P and portal search as default search string. Otherwise, if people switch to Portal mode, the "P2P Web Search" does not make sense.	12 years ago

1 2 3 4 5 ...

291 Commits (395837b3b6f602830b74c2ed39e7e13e064ba9bc)