yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	a2b66fe2eb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	d8e79731df	fixed wrong used memory display	11 years ago
orbiter	da5d4128bf	prevent npe	11 years ago
Michael Benz	072d4aa0c0	Updated German translation and Blacklist_p.html	11 years ago
orbiter	f6e441dd77	refactoring	11 years ago
orbiter	c3f6c06f2c	removed host increment on stored documents from crawler (that was wrong)	11 years ago
Michael Peter Christen	a86c2fe77d	fixed usage of media flag when started by automated process	11 years ago
Michael Benz	f11314aae7	Improved German de.lng translation and fixed adresses -> addresses in \htroot\CrawlStartScanner_p.html	11 years ago
Michael Peter Christen	f0eec6d0f3	Merge branch 'master' of git://gitorious.org/~copro/yacy/copros-rc1	11 years ago
Michael Benz	6278af4993	Edit German de locale and improved translation	11 years ago
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	11 years ago
reger	a02e33dcb6	add edit-link to PK field of table admin	11 years ago
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	11 years ago
Michael Peter Christen	cb2c25d930	in case that the crawler is running and the search user is the peer admin, we expect that the user wants to check recently crawled document to ensure that recent crawl results are inside the search results, we do a soft commit here.	11 years ago
Michael Peter Christen	bf97e38b83	removed clearURLIndex, which is a stub remaining from the old metadata database and not needed any more	11 years ago
Michael Peter Christen	bc28247089	Added methods in resource observer to calculate the available and the occupied disc space. These values are also shown on the status page. The disc space calculation shall be used for a disk-limitation of the search index.	11 years ago
reger	365f77ea8c	make internal page links relative to ease any future development for context aware servlets note also http://bugs.yacy.net/view.php?id=106	11 years ago
Michael Peter Christen	d9858e1b8a	removed warnings and superfluous logging	11 years ago
Michael Peter Christen	7e71dcc417	removed interaction fragments	11 years ago
Michael Peter Christen	94245ce0a8	fixed "Size in KBytes" calculation in PerformanceQueues_p.html, see http://bugs.yacy.net/view.php?id=362	11 years ago
Michael Peter Christen	726e8c3ad5	removed unused classes and servlets	11 years ago
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	11 years ago
Michael Peter Christen	0e6729f9bc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	9228214f9b	enrichment of PerformanceMemory display of SolrInfoMBean table	11 years ago
Michael Peter Christen	e8bdf16ea7	added statistic information for solr resources in PerformanceMemory	11 years ago
reger	1a2b298a65	fix: select all checkbox Tables_p (needs form name attribute)	11 years ago
Michael Peter Christen	931541d198	re-inserted default value re-set button to performance queues and patched missing values for recent new queues	11 years ago
reger	bd1685c94a	fix not needed getFileExtension().toLower (double) add missing .getFileExtension	11 years ago
orbiter	a11f072504	enhanced didyoumean	11 years ago
Michael Peter Christen	bc395c7439	reduced color depth of star icons (for smaller file sizes)	11 years ago
Michael Peter Christen	9e0e39a9a4	small change to start/stop/pause icon style	11 years ago
orbiter	22e3524797	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c40ba51ca6	added new suggest method which replaces more-than-one suggestions: instead of computing suggest permutations of the given words, the completion of a phrase using the given words is searched in the fulltext index.	11 years ago
reger	ad4b213145	remove unused static var from HTTPDProxyHandler	11 years ago
reger	6c6056836d	fix vocabulary navigator checkbox selection (from last commit)	11 years ago
reger	cb71413d19	fix page nav, to keeping modifier (was new issue)	11 years ago
orbiter	ba5ab11cc4	less logging	11 years ago
Michael Peter Christen	322854a5f8	fix auth for forced ping	11 years ago
Michael Peter Christen	fbf4f77d80	fixed missing corona in network picture	11 years ago
Michael Peter Christen	d2b8f2b477	enhancements for staticIP and ipv6 handling	11 years ago
reger	91d79c1ac4	disable wrong forward to https on port change	11 years ago
reger	193b8235c2	remove double jquery-1.3.1.js and adjust header links to jquery-1.3.2	11 years ago
reger	f307d65dcf	prepare for a language navigator works fine to restrict language for local solrSearches. More work needs to be done to make rwi/remote searches respect the modifier.language restriction.	11 years ago
orbiter	768b1306b8	Added a write-enabled checkbox for remote solr servers. It is now possible to assign every peer other YaCy peers as remote solr server which are only used for read operations during search. This also affects crawling: it will exclude urls from crawls which exist on remote solr/remote YaCy peers.	11 years ago
orbiter	f7d6dd136f	changed solr paths according to new default paths	11 years ago
Michael Peter Christen	8b14e92ba4	added button in host browser to re-load 404/failed documents	11 years ago
reger	f47067b0ce	fix search navigator not showing activated nav introduced with `97e84439fb`	11 years ago
reger	9a96a7d73f	put list quick navigator buttons belowon BlackList_p editor replacing the dropdown -> go navigation	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	11 years ago
Michael Peter Christen	c0da966dfa	enhanced crawler speed	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
orbiter	fd4abc0565	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	d5b8e473c8	added load limit for DHT transfer: RWI acceptance only if local load is not too high	11 years ago
reger	41c126978b	fix bug: Crawl Start (Expert) crawls "?-URLs" even if told not to do so http://bugs.yacy.net/view.php?id=329	11 years ago
Michael Peter Christen	a9ed28c0b5	no commit if no action is requested	11 years ago
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	11 years ago
reger	c656e67c97	fix: display proper error msg on admin user change	11 years ago
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	11 years ago
reger	30d925a96e	reimplemented server access restriction via Jetty IPAccessHandler to allow only configured IP's to access. Handler is only loaded if a restriction is configured. Since IPAcessHandler (Jetty 8) does not support IPv6 system property java.net.preferIPv4Stack=true Testing showed system.setProperty seems to be sensitive to point of calling (earliest possible time seems to be best = early in yacy.main). Moved the "isrunning..." just open browser check also to the new routine to preread the yacy.config only once.	11 years ago
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	11 years ago
Michael Peter Christen	7005ecdabd	cleanup	11 years ago
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	11 years ago
Michael Peter Christen	9bd71fdbb4	made the access tracker class static because it shall be used by the jetty auth module	11 years ago
Michael Peter Christen	7d6fc79eb8	refactoring (usage of constant names for attributes of authentication check)	11 years ago
reger	cabe0943cd	fix opensearch resultcount in yacysearch.rss see merge request https://gitorious.org/yacy/rc1/merge_requests/24 use result count in searchtrailer.xml which is on p2p search more accurate (timing)	11 years ago
reger	eaf596a257	adding proxy status to (private) status box (show also transparent and url proxy status) show search result via url proxy only if status=on	11 years ago
reger	e3d8459906	extend ssl enabled msg on status page - post the portnr	11 years ago
reger	58ecf5e4dd	add to blacklist button in CrawlResults http://bugs.yacy.net/view.php?id=220 introduced Blacklist.add with sourcefile only parameter	11 years ago
reger	17b454f957	fix external link (open in new tab)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago
orbiter	2861183359	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	4035e20f0b	unescaping the path	11 years ago
orbiter	7e21d1ff70	"inaccessible" better describes the state of a server which cannot be reached (while 30c3: too many users)	11 years ago
reger	7f9b9315fe	Merge origin/master	11 years ago
reger	8eaabb9600	remove dependency from old serverCore.java - remaining getPortNr not needed (as current release allows only to set plain integer as port, see ConfigBasic)	11 years ago
orbiter	2018e55f8b	switched back on index deletion (was accidently off because new jetty framework delivers never null to post arguments .. there may be more of that kind of problems)	11 years ago
orbiter	d4942ad5e0	startRecord fix; this is not according to SRU definition because this states that the first record has number 0; but +1 is not consistent with other places where the number is used.	11 years ago
reger	3d913558ab	display configured adminUserName in ConfigAccounts_p - fix read default username in in loginservice	11 years ago
reger	fbdd89e198	Merge origin/master	11 years ago
reger	65a2f3d5e7	tweak Jetty credentials to work with YaCy UserDB - user entry in UserDB with admin right can login to access protected pages - dto. admin user, choosen username is stored in conf (adminAccountUserName=)	11 years ago
Michael Peter Christen	ee17bd0b69	added option to attach remote solr servers in read-only mode	11 years ago
Michael Peter Christen	25f9c35033	add patch which shall prevent that naive search mistakes like usage of regular expressions cause no results. Usage of '*' followed by a dot or any expression will now cause that this expression is used as a filetype search.	11 years ago
reger	e05320b776	upd: to open more external links in new browser-tab	11 years ago
reger	cbb5dc01e4	remove obsolete htroot/solr htroot/gsa YaCy-servlets - now handled by standard servlets	11 years ago
reger	71cac1a278	added SSL/HTTPS connector to support SSL/https connection on port 8443 !!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used - added ping test for above to migration - as of now port for https is hardcoded to default 8443 - if not urgend required I'd leave it this way (it's standard) to use different ports for http and https - post https port on ConfigBasic.html (if active)	11 years ago
reger	f681ce15ae	remove obsolete HTTPServer input field	11 years ago
Michael Peter Christen	20b48f894f	refactoring: moving all servlets to the same package (the solr servlet is currently actually a filter which should be changed somehow)	11 years ago
Michael Peter Christen	84167adb49	removed unused anomichttpd code after migration to jetty	11 years ago
Michael Peter Christen	b461a27abb	fixed the SolrServlet	11 years ago
Michael Peter Christen	7603e879dc	Merge branch 'master' into HEAD Conflicts: .classpath source/net/yacy/cora/federate/solr/SolrServlet.java	11 years ago
Michael Peter Christen	25250405f1	solr servlet preparation for join with jetty branch	11 years ago
reger	c84c313fe1	Merge origin/master into jetty	11 years ago
Michael Peter Christen	74466d731a	use pre-compiled patterns in ymark	11 years ago
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	11 years ago
Michael Peter Christen	67e7dc0cc6	added more properties to seedlist servlet	11 years ago
Michael Peter Christen	79771c60c0	IPv6 fixes	11 years ago
reger	92d9c56f9f	Merge origin/master into jetty	11 years ago
Michael Peter Christen	da380343c2	perform greedy learning heuristic only if load < 1.0	11 years ago
Michael Peter Christen	81926c055d	fixed bug with image search in yacyinteractive	11 years ago
Michael Peter Christen	edda0699e4	changed default timeout for port scanner	11 years ago
Michael Peter Christen	f1b5db2c45	- performance graph does not shop peer ping in memory monitor any more - after a forced GC, the PerformanceMemory view switches to automatic update by default	11 years ago
Michael Peter Christen	0db8e34625	enhanced webgraph processing	11 years ago
Michael Peter Christen	9d8b32c63a	fixed a division by zero	11 years ago
Michael Peter Christen	957f6297fb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	effea4bca0	Merge origin/master into jetty Conflicts: source/net/yacy/cora/federate/solr/SolrServlet.java	11 years ago
reger	b49e90d2e9	remove reference to solrServlet from YaCy servlet select - reference is not used - solrServlet is used in Jetty branch and adjustments there conflict with unused solrServlet here.	11 years ago
Michael Peter Christen	38e1e3a707	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
sixcooler	2c2ebb0d92	tried some hardening in order not letting any Solr-Searchers open	11 years ago
Michael Peter Christen	cca79d12ef	setting of some default values to make an client development start easy using the description at http://www.yacy-websuche.de/wiki/index.php/Dev:APIhello	11 years ago
Michael Peter Christen	3d4b5e66ce	disallow remote robots to crawl the HostBrowser servlet	11 years ago
Michael Peter Christen	234ca720f5	only admins should be able to force a commit	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
orbiter	61409788eb	less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form.	11 years ago
reger	5c4a3d1c01	Merge origin/master into jetty	11 years ago
Michael Peter Christen	caa20d63d9	fixed seedlist (hash was missing)	11 years ago
Michael Peter Christen	ccf2f4e43b	refactoring of seed attributes (introduced more constants)	11 years ago
Michael Peter Christen	c927b428d3	fixed json	11 years ago
Michael Peter Christen	64048ff217	fir for XSS	11 years ago
orbiter	b7f1e5af51	added new servlet which generates the same file as the principal peers upload to a bootstrap position you can call it either with http://localhost:8090/yacy/seedlist.html or to generate json (or jsonp) with http://localhost:8090/yacy/seedlist.json http://localhost:8090/yacy/seedlist.json?callback=seedlist	11 years ago
orbiter	3e552550d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c2d720cdaf	purge a lucene cache - possible memory leak fix	11 years ago
reger	f111f30ace	Merge origin/master into jetty	11 years ago
Michael Peter Christen	f4172cbb3d	fix for another XSS bug	11 years ago
orbiter	ff86cb683f	fixed some XSS bugs reported by Marius from http://ctf365.com/	11 years ago
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	11 years ago
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	11 years ago
Michael Peter Christen	9d5895f643	enhanced and fixed postprocessing	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	11 years ago
Michael Peter Christen	69b8d61c47	fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.)	11 years ago
orbiter	4234b0ed6c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	74c86a72a0	better default value for crawler user agent	11 years ago
reger	1437c45383	merge rc1/master	11 years ago
Michael Peter Christen	87a956e881	calculating and showing the number of files and the average size of a file in the HTCACHE in ConfigHTCache_p.html	11 years ago
Michael Peter Christen	acc1f8a749	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	11 years ago
sixcooler	987f410011	URL-export:add query and fix for cast-class-exception	11 years ago
Michael Peter Christen	ffe8276063	replaced referrer link masking to 'pure' links to the referring page (that was more useful during testing)	11 years ago
reger	b38de92a16	Merge origin/master into jetty	11 years ago
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	11 years ago
orbiter	1ac504ae51	use html encoding for urls in metadata	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
Michael Peter Christen	25951cee14	- fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border	11 years ago
Michael Peter Christen	f1bfe64361	integrated startpage to compare_yacy	11 years ago
Michael Peter Christen	2f57327f20	added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web.	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	11 years ago
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	11 years ago
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	16e3b357b3	replaced old tag cloud and adopted design a bit	11 years ago
Michael Peter Christen	dc38d35986	added matching in url field in Table_API_p search	11 years ago
Michael Peter Christen	691d7e70fa	added hint to development/commit rss feed	11 years ago
Michael Peter Christen	b81859c751	Show a RSS icon in the right top corner of search results. This replaces the 'API' icon which was the link for the opensearch result which is an extension of RSS. Since it is more appropriate to visualize a RSS link with an RSS icon, this API icon was changed here.	11 years ago
Michael Peter Christen	1a09771be8	fixed sitemap crawl start	11 years ago
orbiter	b743e6d79f	- prevent that crawl filter have empty (never-match) content - rewrite the description of the options "Restrict to start domain(s)" and "Restrict to sub-path(s)" to an explanation, that the restriction applies to all links in the link list of the option "From Link-List of URL" if this option is selected - allow "Restrict to sub-path(s)" if the "From Link-List of URL" is selected. This is supported in the crawl start.	11 years ago
orbiter	f597fdb602	make it easier to filter properties (case insensitive)	11 years ago
reger	f46c723398	allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking	11 years ago
reger	1adb4b8741	merge rc1/master	11 years ago
reger	37d24f3318	make use of declared static string ACTION_LOCATION	11 years ago
reger	eea504c117	update Info.plist small DefaultServlet refactoring	11 years ago
reger	a44eede8b8	merge rc1/master	11 years ago
reger	54a0272338	searchpage javascript (latestinfo) causes reset of search statistic after moving to next page - disabled call via setTimeout in yacysearch.html	11 years ago
Michael Peter Christen	91fa99e9bb	added new icon/image for latest commit	11 years ago
Michael Peter Christen	9fac9249bc	- replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible.	11 years ago
Michael Peter Christen	0f6db6ad5b	Merge remote-tracking branch 'jensbees/crawlexpert-post'	11 years ago
Jens Bertram	3252c1ec39	Merge upstream/master into crawlexpert-post	11 years ago
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	11 years ago
bhoerdzn	a3824dfbaa	check URL on inital load, if set	11 years ago
bhoerdzn	52f49d475b	add a hidden field for "crawlingstart" since jQuery omits the submit button value	11 years ago
bhoerdzn	b0c0ec2dec	link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"	11 years ago
bhoerdzn	d64d45361c	use integer types for boolean values	11 years ago
bhoerdzn	eda123d6fd	remove debugging code intercepting post requests	11 years ago
bhoerdzn	5057f27bbd	fix typo in parsing "cachePolicy" parameter	11 years ago
bhoerdzn	98f5c9018d	Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.	11 years ago
bhoerdzn	a6a62986d4	correct state handling for country code restriction	11 years ago
bhoerdzn	4066b85155	correctly set initial state for load filters	11 years ago
bhoerdzn	8c91c3e7cd	set form boolean values to 0 & 1 instead of false & true	11 years ago
bhoerdzn	c27fabc88e	fixed wrong parameter check	11 years ago
bhoerdzn	2214bf5396	Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.	11 years ago
reger	71d2655c02	downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets - adjust other test cases/classes	11 years ago
orbiter	705b3338ee	list more fields available for search and for ranking boosts	11 years ago
bhoerdzn	405878182f	Use list template for all other option lists. Fixed some template expressions.	11 years ago
bhoerdzn	8e74098cd4	Use list template for "reloadIfOlderNumber".	11 years ago
bhoerdzn	52bad7b908	Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.	11 years ago
Michael Peter Christen	e56aa4fe93	fixed search navigation	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
bhoerdzn	45cf553bc3	try to guess default crawling mode, if none set	11 years ago
bhoerdzn	b4f0c822f2	assign strings before checking contents	11 years ago
bhoerdzn	499abe8f91	set default values for string parameters	11 years ago
bhoerdzn	42ea56eaad	made crawStartExpert_p aware of post variables; extended template where needed	11 years ago
reger	c7c706fd9f	merge with rc1/master	11 years ago
Michael Peter Christen	82bfd9e00a	- crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue.	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	11 years ago
reger	5c4ba9b5db	merge rc1 master	11 years ago
reger	70c51775ae	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
orbiter	d2effd21db	fix for npe during location search	11 years ago
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	11 years ago
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	13fc86c960	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	127adbf5cf	remove references to 10_http thread (legacy http server) and add needed get/set function to jetty http server wrapper	11 years ago
Michael Peter Christen	3e22d05290	added option for daterange properties in GSA interface to use an left- or right-open date range; i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional to daterange=2013-09-02..2013-09-09	11 years ago
reger	36b7159282	- remove double initialization of jetty - refactor some var assignments	11 years ago
reger	63ed04260a	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	11 years ago
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	5d71a4c8bc	fix for dc:description field	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
Michael Peter Christen	6184fd9d9a	fix for solr/gsa result logging	11 years ago
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	11 years ago
orbiter	f106345eef	link strings should not be tokenized	11 years ago
orbiter	5b14bdfffd	npe fix	11 years ago
orbiter	1ca4b9612c	added special handling of the BinaryResponseWriter in the solr interface which makes it possible to use solrj with the javabin format which is much better (compressed, no xml overhead, java object streams) and faster. Furthermore, this enables the 'shards' option in the solr interface which connects one solr (YaCy) to another solr (YaCy) ad-hoc.	11 years ago
Michael Peter Christen	a88a62f7aa	added a feature to set a collection for a crawl result based on a regular expression on th url: the collection attribut for a crawl start may be now either a token or a list of tokens, seperated by ',' where a token is either a string or a pair <string,pattern> where the string is separated to the pattern with a ':' and the string is assigned to the document as collection only if the pattern matches with the url.	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	11 years ago
Michael Peter Christen	e6b423c4d9	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	94bec24d14	add back menu to Surftips page (currently no menu is displayed)	11 years ago
Michael Peter Christen	1f299b0d42	removed link.gif as link button because this image is now shown automatically for expernal links	11 years ago
Michael Peter Christen	48ddd50a6c	html fix	11 years ago
reger	96ae332427	revert del _blank (last commit) in template	11 years ago
reger	43348a98a9	add some href target=_blank to ext. links with external icon	11 years ago
reger	82d81a57bd	info msg if no embedded Solr http://bugs.yacy.net/view.php?id=279	11 years ago
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	11 years ago
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	11 years ago
reger	070bf85b33	css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit `112836dcc9`)	11 years ago
sixcooler	8a96140f92	fix / workaround for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750 + Seed.hash should be final	11 years ago
Michael Peter Christen	2674d28ef4	protection against self-ping (may be cause by fraud attempts)	11 years ago
orbiter	f3d001c7ab	more space in the about section	11 years ago
Michael Peter Christen	e879b97b0a	added line to enhance debugging	11 years ago
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
Marc Nause	112836dcc9	Improved external links. ) image links will not be marked (if they have class "yacylogo" or "forceNoExternalIcon") ) external links in menu on left (and "fork me"-banner) will open in new tab/window now	11 years ago
Marc Nause	d64a094f0e	External links in HTML interface are marked as external with small icon. ) added new icon ) added CSS rules to mark all external links except search results (target="_self")	11 years ago
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago

... 3 4 5 6 7 ...

4868 Commits (49886fab08bef6816b7a3e627fb567f77453731a)