yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	b38de92a16	Merge origin/master into jetty	11 years ago
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	11 years ago
orbiter	1ac504ae51	use html encoding for urls in metadata	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
Michael Peter Christen	25951cee14	- fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border	11 years ago
Michael Peter Christen	f1bfe64361	integrated startpage to compare_yacy	11 years ago
Michael Peter Christen	2f57327f20	added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web.	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	11 years ago
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	11 years ago
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	16e3b357b3	replaced old tag cloud and adopted design a bit	11 years ago
Michael Peter Christen	dc38d35986	added matching in url field in Table_API_p search	11 years ago
Michael Peter Christen	691d7e70fa	added hint to development/commit rss feed	11 years ago
Michael Peter Christen	b81859c751	Show a RSS icon in the right top corner of search results. This replaces the 'API' icon which was the link for the opensearch result which is an extension of RSS. Since it is more appropriate to visualize a RSS link with an RSS icon, this API icon was changed here.	11 years ago
Michael Peter Christen	1a09771be8	fixed sitemap crawl start	11 years ago
orbiter	b743e6d79f	- prevent that crawl filter have empty (never-match) content - rewrite the description of the options "Restrict to start domain(s)" and "Restrict to sub-path(s)" to an explanation, that the restriction applies to all links in the link list of the option "From Link-List of URL" if this option is selected - allow "Restrict to sub-path(s)" if the "From Link-List of URL" is selected. This is supported in the crawl start.	11 years ago
orbiter	f597fdb602	make it easier to filter properties (case insensitive)	11 years ago
reger	f46c723398	allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking	11 years ago
reger	1adb4b8741	merge rc1/master	11 years ago
reger	37d24f3318	make use of declared static string ACTION_LOCATION	11 years ago
reger	eea504c117	update Info.plist small DefaultServlet refactoring	11 years ago
reger	a44eede8b8	merge rc1/master	11 years ago
reger	54a0272338	searchpage javascript (latestinfo) causes reset of search statistic after moving to next page - disabled call via setTimeout in yacysearch.html	11 years ago
Michael Peter Christen	91fa99e9bb	added new icon/image for latest commit	11 years ago
Michael Peter Christen	9fac9249bc	- replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible.	11 years ago
Michael Peter Christen	0f6db6ad5b	Merge remote-tracking branch 'jensbees/crawlexpert-post'	11 years ago
Jens Bertram	3252c1ec39	Merge upstream/master into crawlexpert-post	11 years ago
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	11 years ago
bhoerdzn	a3824dfbaa	check URL on inital load, if set	11 years ago
bhoerdzn	52f49d475b	add a hidden field for "crawlingstart" since jQuery omits the submit button value	11 years ago
bhoerdzn	b0c0ec2dec	link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"	11 years ago
bhoerdzn	d64d45361c	use integer types for boolean values	11 years ago
bhoerdzn	eda123d6fd	remove debugging code intercepting post requests	11 years ago
bhoerdzn	5057f27bbd	fix typo in parsing "cachePolicy" parameter	11 years ago
bhoerdzn	98f5c9018d	Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.	11 years ago
bhoerdzn	a6a62986d4	correct state handling for country code restriction	11 years ago
bhoerdzn	4066b85155	correctly set initial state for load filters	11 years ago
bhoerdzn	8c91c3e7cd	set form boolean values to 0 & 1 instead of false & true	11 years ago
bhoerdzn	c27fabc88e	fixed wrong parameter check	11 years ago
bhoerdzn	2214bf5396	Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.	11 years ago
reger	71d2655c02	downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets - adjust other test cases/classes	11 years ago
orbiter	705b3338ee	list more fields available for search and for ranking boosts	11 years ago
bhoerdzn	405878182f	Use list template for all other option lists. Fixed some template expressions.	11 years ago
bhoerdzn	8e74098cd4	Use list template for "reloadIfOlderNumber".	11 years ago
bhoerdzn	52bad7b908	Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.	11 years ago
Michael Peter Christen	e56aa4fe93	fixed search navigation	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
bhoerdzn	45cf553bc3	try to guess default crawling mode, if none set	11 years ago
bhoerdzn	b4f0c822f2	assign strings before checking contents	11 years ago
bhoerdzn	499abe8f91	set default values for string parameters	11 years ago
bhoerdzn	42ea56eaad	made crawStartExpert_p aware of post variables; extended template where needed	11 years ago
reger	c7c706fd9f	merge with rc1/master	11 years ago
Michael Peter Christen	82bfd9e00a	- crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue.	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	11 years ago
reger	5c4ba9b5db	merge rc1 master	11 years ago
reger	70c51775ae	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
orbiter	d2effd21db	fix for npe during location search	11 years ago
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	11 years ago
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	13fc86c960	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	127adbf5cf	remove references to 10_http thread (legacy http server) and add needed get/set function to jetty http server wrapper	11 years ago
Michael Peter Christen	3e22d05290	added option for daterange properties in GSA interface to use an left- or right-open date range; i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional to daterange=2013-09-02..2013-09-09	11 years ago
reger	36b7159282	- remove double initialization of jetty - refactor some var assignments	11 years ago
reger	63ed04260a	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	11 years ago
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	5d71a4c8bc	fix for dc:description field	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
Michael Peter Christen	6184fd9d9a	fix for solr/gsa result logging	11 years ago
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	11 years ago
orbiter	f106345eef	link strings should not be tokenized	11 years ago
orbiter	5b14bdfffd	npe fix	11 years ago
orbiter	1ca4b9612c	added special handling of the BinaryResponseWriter in the solr interface which makes it possible to use solrj with the javabin format which is much better (compressed, no xml overhead, java object streams) and faster. Furthermore, this enables the 'shards' option in the solr interface which connects one solr (YaCy) to another solr (YaCy) ad-hoc.	11 years ago
Michael Peter Christen	a88a62f7aa	added a feature to set a collection for a crawl result based on a regular expression on th url: the collection attribut for a crawl start may be now either a token or a list of tokens, seperated by ',' where a token is either a string or a pair <string,pattern> where the string is separated to the pattern with a ':' and the string is assigned to the document as collection only if the pattern matches with the url.	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	11 years ago
Michael Peter Christen	e6b423c4d9	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	94bec24d14	add back menu to Surftips page (currently no menu is displayed)	11 years ago
Michael Peter Christen	1f299b0d42	removed link.gif as link button because this image is now shown automatically for expernal links	11 years ago
Michael Peter Christen	48ddd50a6c	html fix	11 years ago
reger	96ae332427	revert del _blank (last commit) in template	11 years ago
reger	43348a98a9	add some href target=_blank to ext. links with external icon	11 years ago
reger	82d81a57bd	info msg if no embedded Solr http://bugs.yacy.net/view.php?id=279	11 years ago
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	11 years ago
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	11 years ago
reger	070bf85b33	css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit `112836dcc9`)	11 years ago
sixcooler	8a96140f92	fix / workaround for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750 + Seed.hash should be final	11 years ago
Michael Peter Christen	2674d28ef4	protection against self-ping (may be cause by fraud attempts)	11 years ago
orbiter	f3d001c7ab	more space in the about section	11 years ago
Michael Peter Christen	e879b97b0a	added line to enhance debugging	11 years ago
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
Marc Nause	112836dcc9	Improved external links. ) image links will not be marked (if they have class "yacylogo" or "forceNoExternalIcon") ) external links in menu on left (and "fork me"-banner) will open in new tab/window now	11 years ago
Marc Nause	d64a094f0e	External links in HTML interface are marked as external with small icon. ) added new icon ) added CSS rules to mark all external links except search results (target="_self")	11 years ago
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago
sixcooler	7d53ac86a3	fix for Blacklist (-Administration)	11 years ago
orbiter	f425b2c61c	re-try to fetch url after a soft commit	11 years ago
orbiter	bf0ad04e1b	apply load limitation also to dht-in	11 years ago
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	11 years ago
Roland Haeder	e2ee412160	Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS' Conflicts: htroot/api/blacklists_p.java	11 years ago
Roland Haeder	ae19401af0	Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	59225487ea	Fix for blacklist export, also applied the filename filter here	11 years ago
Roland Haeder	952fc0e7bd	Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block	11 years ago
Roland Haeder	060fec1577	Reuse Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	29049c71f5	Possible fix for ticket http://bugs.yacy.net/view.php?id=270 , the filter for only including *.black must be applied	11 years ago
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	11 years ago
orbiter	9c681cc00d	added segment sizes, postprocessing status and cpu load to crawler monitor	11 years ago
orbiter	86b514cf46	added load info to status_p.xml	11 years ago
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	11 years ago
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	11 years ago
orbiter	232100301c	removed double-ocurring value assignments	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Felix Ableitner	376f9cd9d0	Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure	11 years ago
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	11 years ago
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	11 years ago
orbiter	aba7cc5de7	added cpu load information to status page	11 years ago
Roland Haeder	59b4fdd5ad	Merge remote-tracking branch 'upstream/master'	12 years ago
orbiter	5493389576	stealth mode shall only be available for authorized users, because unauthorized users can otherwise be monitored by authorized users	12 years ago
Roland Haeder	ebbb3bc5c1	Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	12 years ago
orbiter	575f913154	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	c4efb612e2	added list of crawls to status_p.xml	12 years ago
Lotus	bb6caa346c	Do not allow automatic update in case YaCy is installed to the Program Files folder on Windows. There are no permissions to write that folder and update would fail.	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Felix Ableitner	a020697d64	Fixed problems with blacklist entry insertion.	12 years ago
sixcooler	bff8c753c6	re-insert this file - was deleted by mistake + correct an other case-typo	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	c79f687110	enhanced the network scanner: find more hosts automatically by removal of common subdomains before application of protocol-specific prefix	12 years ago
orbiter	b4677d1cad	fix for bug #252 the naming of the servlet was wrong, the bug may not be present on systems where upper/lowercase matching is lazy (windows)	12 years ago
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
orbiter	f8c28efd66	fix for rssTerminal coloring	12 years ago
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	12 years ago
Michael Peter Christen	3054a6d4b9	added a patch from Sebastian M.B., submitted by email for coloring of rss terminal	12 years ago
Michael Peter Christen	78af998f8f	Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
Felix Ableitner	fd90fcc4e0	Fixes #196 .	12 years ago
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	12 years ago
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	12 years ago
Michael Peter Christen	203921006a	redesign of citation index storage	12 years ago
Michael Peter Christen	e92b9275ce	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	56cdcfa2fa	fixed greedy learning mode - global is not a search attribute in searchitems	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	0c5bed7e2c	added configuration option for greedy learning function to ConfigPortal servlet	12 years ago
sixcooler	5d1f619f07	possible helpful closing of solr-requests	12 years ago
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
sixcooler	e5abccdfe4	added optimize-option	12 years ago
Michael Peter Christen	8ea6ddf636	removed attributes from ConfigPortal.html which are redundant to ConfigSearchPage_p.html	12 years ago
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	12 years ago
Michael Peter Christen	23fb458963	- fix to gsa searchresult answer in case that no query part is given - fix to gsa default number of results (is 'num')	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	54024958ac	added url_file_name_s in qeury for live-search of urls	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
Michael Peter Christen	f542cf7d9c	fix for daterange: the to-date is inclusive	12 years ago
Michael Peter Christen	c36720d45f	added daterange option to gsa api	12 years ago
Michael Peter Christen	4e3007f4a0	typo	12 years ago
Michael Peter Christen	2cb6b6bc21	added target="_blank" to shutdown links	12 years ago
orbiter	c8e94ad7c7	fix for citation search in case that the citation is very fresh	12 years ago
orbiter	57dcf68665	added a feed-back message inside the shutdown page	12 years ago
Michael Peter Christen	0600d510e1	show the citation report also in ViewFile	12 years ago
Michael Peter Christen	1a92b61d69	fixed usage of ViewFile which needs a commit before showing latest crawl result pages.	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	2fd7bbb450	reduced load on solr; no seed update in Status and no exists-check in HTTPLoader in case of redirects, that can be done using the htcache.	12 years ago
Michael Peter Christen	7ee71c2354	changed administration page headline to 'admnistration'	12 years ago
Michael Peter Christen	efd973d29d	changed p2p/stealth mode text and links a bit	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	a5e328d7c5	new icons	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
Michael Peter Christen	5132bf719c	added new buttons to search result page in p2p mode which show the switch between p2p search and the 'stealth mode' which is simply a non-p2p search within the p2p network. The functionality was there all the time, but the switch to this was not very visible.	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
orbiter	200769d0c6	show the cache link in search results only if there is actually a cache entry stored in HTCACHE	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
Michael Peter Christen	fdcd4e6a6f	fixes to index deletion: quoting of host name (a '-' may be part of the url) and disabling the engage button when changing the url field at 'Delete by URL matching'	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
orbiter	5c7ddc67fe	in GSA api enable usage of solr fq-attribute together with GSA site-attribute	12 years ago
Michael Peter Christen	eb9d0ba5b1	ranking and boost function update, small bugfixes, better default search field for solr	12 years ago
Michael Peter Christen	5f92c68f1f	removed block rank ranking and all YBR files in /ranking	12 years ago
Michael Peter Christen	164603b946	cleanup	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	709e9b8ce7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	9e07447d47	added new link for SMW	12 years ago
Michael Peter Christen	3c04dd11de	removed dead link	12 years ago
Michael Peter Christen	281959a2d7	added option to re-boot the embedded solr during run-time. Added also API recording for this method so it can be repeated automatically. The index dump generation is now also available for API recording. Added some synchronization in backend which was necessary for this.	12 years ago
Michael Peter Christen	80a7989e8c	fixed ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.util.List; in robots.txt servlet	12 years ago
orbiter	da621e827e	prevent NPE in case RWI is disabled	12 years ago
Michael Peter Christen	7300d81f40	include API Table deletion requests to the API recorder	12 years ago
Michael Peter Christen	d2ade87b49	fixed missing thisaddress in yacysearch.html which caused that the opensearch link was not working	12 years ago
Michael Peter Christen	179d032181	added a (badly formatted) delete button for process scheduler entries	12 years ago
reger	c03f75ebc3	fix DHT url receive see http://bugs.yacy.net/view.php?id=242	12 years ago
Marc Nause	8fb1b1e290	*) simplified banner creation code	12 years ago
Marc Nause	cd0b5f31b4	*) updated links to description of regex	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
Michael Peter Christen	f93501e6e0	nice crawl name if crawl is started with file:// (was: null)	12 years ago
Michael Peter Christen	b4f0cac102	added the reindexing job servlet to the submenu structure	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago
Michael Peter Christen	c91c67c3cd	reject bad solr requests	12 years ago
Michael Peter Christen	44e363f37f	refactoring of WorkflowProcessor, added process counter, update of process counter if an blocking thread dies. Added also a new column in PerformanceConcurrency_p servlet to show the actual number of concurrent processes.	12 years ago
reger	79401cb938	added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html) this allows to remove obsolete fields from the index (according to current schema config) by selecting all documents containig disabled fields.	12 years ago
Michael Peter Christen	b24d1d18e4	removed synchronization and concurrency in Fulltext class, concurrent deletions are now handled in ConcurrentUpdateSolrConnector	12 years ago
Michael Peter Christen	f965d04496	added new peer icons for Mentor peers and Mentee peers (not used yet)	12 years ago
Michael Peter Christen	b9b446bca6	- added ssl configuration sign (a lock) to network statistic/table - fixed a bug in bitfield	12 years ago
Michael Peter Christen	7095446ad3	added checkbox (near port) to switch on ssl support (https access) to the admin interface.	12 years ago
Michael Peter Christen	e6c8b545c2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	4baa0d4a97	Added a default keystore for ssl encryption of the YaCy web interface. This will enable https-access to YaCy, but this feature is disabled by default using the new server.https=false attribute. This has two purposes: - make it easier for everyone to use https (just set server.https=true) - provide the basis for secure yacy-to-yacy communication in the future	12 years ago
Michael Peter Christen	038f956821	fix for sitemap detection: the sitemap url was not visible if it appeared after the declaration of robots allow/deny for the crawler because the sitemap parser terminated after the allow/deny rules had been found. Now the parser reads the robots.txt until the end to discover also sitemap rules at the end of the file.	12 years ago
Michael Peter Christen	e26bdd4a52	fixes to deletion methods (removed unnecessary concurrency and added removal of crawl queue entries)	12 years ago
Michael Peter Christen	f7f3e28c5e	prevent that the size of the index is computed too many times. Because the index size is now provided by solr, and the only way to do that is a match for [* TO *], a size computation is quite complex and time-consuming. Therefore this patch prevents that the method is called at all and if necessary puts a DOS-preventing barrier in front of it.	12 years ago
Michael Peter Christen	cca19d94d4	re-declared some fields to be of type string rather than text which makes them more efficient and less large	12 years ago
Michael Peter Christen	ed1d5bace6	draw the names of other peers which receive/send dht into the network graphic	12 years ago
Michael Peter Christen	b528448332	enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page.	12 years ago
Michael Peter Christen	f1bb54943e	typo	12 years ago
Michael Peter Christen	d7fd346917	- added regular-expression based deletions - on-demand collection-list generation for collection-based deletions instead of a default collection-list presentation (this makes calling the interface much faster since the computation of collections lists for large indexes may take some seconds)	12 years ago
Michael Peter Christen	3841854c97	abstraction of catchall term	12 years ago
sixcooler	e145afb8d6	fix for PerformanceMemory showing UNRESOLVED_PATTERN by removing solr-cache-stuff, which is not available anymore	12 years ago
Michael Peter Christen	1b102d98d8	- added index deletion to index administration submenu - added index deletion processes to the process scheduler/recorder	12 years ago
Michael Peter Christen	0e2ee00fea	added an index deletion servlet and some style changes for the 'dangerous' engage-button	12 years ago
Michael Peter Christen	e4f7e5bcfe	fixed bad css change	12 years ago
Michael Peter Christen	3502b4c697	refactoring (renaming) of yacy-solr api	12 years ago
Michael Peter Christen	3a0fcfbeda	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	25499eead5	- added a new field for the regular expression in crawl start - added the field in crawl profile - adopted logging end error management - adopted duplicate document detection - added a new rule to the indexing process to reject non-matching content - full redesign of the expert crawl start servlet The new filter field can now be seen in /CrawlStartExpert_p.html at Section "Document Filter", subsection item "Filter on Content of Document"	12 years ago
reger	0a9b0992f3	RinkingSolr_p: include warning if boost field not in local index	12 years ago
orbiter	e1bfe9d07a	- reduction of the concurrently running processes to make YaCy more adjusted to smaller and 1-core devices. - the workflow processor now starts no process at all. these are started as soon as parser/condenser/indexing queues are filled. - better abstraction	12 years ago
Michael Peter Christen	c091000165	added collection attribute also to the rss feed reader	12 years ago
orbiter	f7571386a3	added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time).	12 years ago
orbiter	3e79bd4b1f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d571e739b6	increased row limitation for authorized users from 10000 to 100000000 in solr interface	12 years ago
Michael Peter Christen	a1fffe8e86	fixed default ranking values	12 years ago
Michael Peter Christen	1d30082446	added hindi translation configuration	12 years ago
Michael Peter Christen	97775fbebc	fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/	12 years ago
Michael Peter Christen	298bf2deb5	fix to ranking configuration servlet	12 years ago
Michael Peter Christen	2db058b551	added in RankingSolr_p.html a select box to switch between different ranking situations. By default, four situations can be configured.	12 years ago
Michael Peter Christen	6fbca35215	fixed api table navigation	12 years ago

... 3 4 5 6 7 ...

4724 Commits (d1091e79f83591502fdc08444aca84b733300a71)