yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	9fac9249bc	- replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible.	11 years ago
Michael Peter Christen	0f6db6ad5b	Merge remote-tracking branch 'jensbees/crawlexpert-post'	11 years ago
Jens Bertram	3252c1ec39	Merge upstream/master into crawlexpert-post	11 years ago
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	11 years ago
bhoerdzn	a3824dfbaa	check URL on inital load, if set	11 years ago
bhoerdzn	52f49d475b	add a hidden field for "crawlingstart" since jQuery omits the submit button value	11 years ago
bhoerdzn	b0c0ec2dec	link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"	11 years ago
bhoerdzn	d64d45361c	use integer types for boolean values	11 years ago
bhoerdzn	eda123d6fd	remove debugging code intercepting post requests	11 years ago
bhoerdzn	5057f27bbd	fix typo in parsing "cachePolicy" parameter	11 years ago
bhoerdzn	98f5c9018d	Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.	11 years ago
bhoerdzn	a6a62986d4	correct state handling for country code restriction	11 years ago
bhoerdzn	4066b85155	correctly set initial state for load filters	11 years ago
bhoerdzn	8c91c3e7cd	set form boolean values to 0 & 1 instead of false & true	11 years ago
bhoerdzn	c27fabc88e	fixed wrong parameter check	11 years ago
bhoerdzn	2214bf5396	Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.	11 years ago
reger	71d2655c02	downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets - adjust other test cases/classes	11 years ago
orbiter	705b3338ee	list more fields available for search and for ranking boosts	11 years ago
bhoerdzn	405878182f	Use list template for all other option lists. Fixed some template expressions.	11 years ago
bhoerdzn	8e74098cd4	Use list template for "reloadIfOlderNumber".	11 years ago
bhoerdzn	52bad7b908	Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.	11 years ago
Michael Peter Christen	e56aa4fe93	fixed search navigation	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
bhoerdzn	45cf553bc3	try to guess default crawling mode, if none set	11 years ago
bhoerdzn	b4f0c822f2	assign strings before checking contents	11 years ago
bhoerdzn	499abe8f91	set default values for string parameters	11 years ago
bhoerdzn	42ea56eaad	made crawStartExpert_p aware of post variables; extended template where needed	11 years ago
reger	c7c706fd9f	merge with rc1/master	11 years ago
Michael Peter Christen	82bfd9e00a	- crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue.	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	11 years ago
reger	5c4ba9b5db	merge rc1 master	11 years ago
reger	70c51775ae	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
orbiter	d2effd21db	fix for npe during location search	11 years ago
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	11 years ago
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	13fc86c960	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	127adbf5cf	remove references to 10_http thread (legacy http server) and add needed get/set function to jetty http server wrapper	11 years ago
Michael Peter Christen	3e22d05290	added option for daterange properties in GSA interface to use an left- or right-open date range; i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional to daterange=2013-09-02..2013-09-09	11 years ago
reger	36b7159282	- remove double initialization of jetty - refactor some var assignments	11 years ago
reger	63ed04260a	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	11 years ago
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	5d71a4c8bc	fix for dc:description field	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
Michael Peter Christen	6184fd9d9a	fix for solr/gsa result logging	11 years ago
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	11 years ago
orbiter	f106345eef	link strings should not be tokenized	11 years ago
orbiter	5b14bdfffd	npe fix	11 years ago
orbiter	1ca4b9612c	added special handling of the BinaryResponseWriter in the solr interface which makes it possible to use solrj with the javabin format which is much better (compressed, no xml overhead, java object streams) and faster. Furthermore, this enables the 'shards' option in the solr interface which connects one solr (YaCy) to another solr (YaCy) ad-hoc.	11 years ago
Michael Peter Christen	a88a62f7aa	added a feature to set a collection for a crawl result based on a regular expression on th url: the collection attribut for a crawl start may be now either a token or a list of tokens, seperated by ',' where a token is either a string or a pair <string,pattern> where the string is separated to the pattern with a ':' and the string is assigned to the document as collection only if the pattern matches with the url.	11 years ago
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	11 years ago
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	11 years ago
Michael Peter Christen	e6b423c4d9	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	94bec24d14	add back menu to Surftips page (currently no menu is displayed)	11 years ago
Michael Peter Christen	1f299b0d42	removed link.gif as link button because this image is now shown automatically for expernal links	11 years ago
Michael Peter Christen	48ddd50a6c	html fix	11 years ago
reger	96ae332427	revert del _blank (last commit) in template	11 years ago
reger	43348a98a9	add some href target=_blank to ext. links with external icon	11 years ago
reger	82d81a57bd	info msg if no embedded Solr http://bugs.yacy.net/view.php?id=279	11 years ago
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	11 years ago
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	11 years ago
reger	070bf85b33	css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit `112836dcc9`)	11 years ago
sixcooler	8a96140f92	fix / workaround for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750 + Seed.hash should be final	11 years ago
Michael Peter Christen	2674d28ef4	protection against self-ping (may be cause by fraud attempts)	11 years ago
orbiter	f3d001c7ab	more space in the about section	11 years ago
Michael Peter Christen	e879b97b0a	added line to enhance debugging	11 years ago
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
Marc Nause	112836dcc9	Improved external links. ) image links will not be marked (if they have class "yacylogo" or "forceNoExternalIcon") ) external links in menu on left (and "fork me"-banner) will open in new tab/window now	11 years ago
Marc Nause	d64a094f0e	External links in HTML interface are marked as external with small icon. ) added new icon ) added CSS rules to mark all external links except search results (target="_self")	11 years ago
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago
sixcooler	7d53ac86a3	fix for Blacklist (-Administration)	11 years ago
orbiter	f425b2c61c	re-try to fetch url after a soft commit	11 years ago
orbiter	bf0ad04e1b	apply load limitation also to dht-in	11 years ago
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	11 years ago
Roland Haeder	e2ee412160	Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS' Conflicts: htroot/api/blacklists_p.java	11 years ago
Roland Haeder	ae19401af0	Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	59225487ea	Fix for blacklist export, also applied the filename filter here	11 years ago
Roland Haeder	952fc0e7bd	Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block	11 years ago
Roland Haeder	060fec1577	Reuse Blacklist.BLACKLIST_FILENAME_FILTER	11 years ago
Roland Haeder	29049c71f5	Possible fix for ticket http://bugs.yacy.net/view.php?id=270 , the filter for only including *.black must be applied	11 years ago
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	11 years ago
orbiter	9c681cc00d	added segment sizes, postprocessing status and cpu load to crawler monitor	11 years ago
orbiter	86b514cf46	added load info to status_p.xml	11 years ago
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	11 years ago
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	11 years ago
orbiter	232100301c	removed double-ocurring value assignments	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Felix Ableitner	376f9cd9d0	Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure	11 years ago
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	11 years ago
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	11 years ago
orbiter	aba7cc5de7	added cpu load information to status page	11 years ago
Roland Haeder	59b4fdd5ad	Merge remote-tracking branch 'upstream/master'	12 years ago
orbiter	5493389576	stealth mode shall only be available for authorized users, because unauthorized users can otherwise be monitored by authorized users	12 years ago
Roland Haeder	ebbb3bc5c1	Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	12 years ago
orbiter	575f913154	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	c4efb612e2	added list of crawls to status_p.xml	12 years ago
Lotus	bb6caa346c	Do not allow automatic update in case YaCy is installed to the Program Files folder on Windows. There are no permissions to write that folder and update would fail.	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Felix Ableitner	a020697d64	Fixed problems with blacklist entry insertion.	12 years ago
sixcooler	bff8c753c6	re-insert this file - was deleted by mistake + correct an other case-typo	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	c79f687110	enhanced the network scanner: find more hosts automatically by removal of common subdomains before application of protocol-specific prefix	12 years ago
orbiter	b4677d1cad	fix for bug #252 the naming of the servlet was wrong, the bug may not be present on systems where upper/lowercase matching is lazy (windows)	12 years ago
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
orbiter	f8c28efd66	fix for rssTerminal coloring	12 years ago
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	12 years ago
Michael Peter Christen	3054a6d4b9	added a patch from Sebastian M.B., submitted by email for coloring of rss terminal	12 years ago
Michael Peter Christen	78af998f8f	Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'	12 years ago
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	12 years ago
Felix Ableitner	fd90fcc4e0	Fixes #196 .	12 years ago
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	12 years ago
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	12 years ago
Michael Peter Christen	203921006a	redesign of citation index storage	12 years ago
Michael Peter Christen	e92b9275ce	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	56cdcfa2fa	fixed greedy learning mode - global is not a search attribute in searchitems	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	0c5bed7e2c	added configuration option for greedy learning function to ConfigPortal servlet	12 years ago
sixcooler	5d1f619f07	possible helpful closing of solr-requests	12 years ago
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
sixcooler	e5abccdfe4	added optimize-option	12 years ago
Michael Peter Christen	8ea6ddf636	removed attributes from ConfigPortal.html which are redundant to ConfigSearchPage_p.html	12 years ago
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	12 years ago
Michael Peter Christen	23fb458963	- fix to gsa searchresult answer in case that no query part is given - fix to gsa default number of results (is 'num')	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	54024958ac	added url_file_name_s in qeury for live-search of urls	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
Michael Peter Christen	f542cf7d9c	fix for daterange: the to-date is inclusive	12 years ago
Michael Peter Christen	c36720d45f	added daterange option to gsa api	12 years ago
Michael Peter Christen	4e3007f4a0	typo	12 years ago
Michael Peter Christen	2cb6b6bc21	added target="_blank" to shutdown links	12 years ago
orbiter	c8e94ad7c7	fix for citation search in case that the citation is very fresh	12 years ago
orbiter	57dcf68665	added a feed-back message inside the shutdown page	12 years ago
Michael Peter Christen	0600d510e1	show the citation report also in ViewFile	12 years ago

1 2 3 4 5 ...

4598 Commits (cabe0943cd813066a1df92f7d1581724b117a357)