Agenda: ex-ganeti-failure-scenarios.htm

File ex-ganeti-failure-scenarios.htm, 43.7 KB (added by b.candler, 5 years ago)
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2<html xmlns="http://www.w3.org/1999/xhtml">
3<head>
4  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
5  <meta http-equiv="Content-Style-Type" content="text/css" />
6  <meta name="generator" content="pandoc" />
7  <title>Ganeti: failures and recovery scenarios</title>
8  <style type="text/css">code{white-space: pre;}</style>
9  <link href="data:text/css,%2F%2A%0A%20%20%20%20Buttondown%0A%20%20%20%20A%20Markdown%2FMultiMarkdown%2FPandoc%20HTML%20output%20CSS%20stylesheet%0A%20%20%20%20Author%3A%20Ryan%20Gray%0A%20%20%20%20Date%3A%2015%20Feb%202011%0A%20%20%20%20Revised%3A%2021%20Feb%202012%0A%20%20%20%0A%20%20%20%20General%20style%20is%20clean%2C%20with%20minimal%20re%2Ddefinition%20of%20the%20defaults%20or%20%0A%20%20%20%20overrides%20of%20user%20font%20settings%2E%20The%20body%20text%20and%20header%20styles%20are%20%0A%20%20%20%20left%20alone%20except%20title%2C%20author%20and%20date%20classes%20are%20centered%2E%20A%20Pandoc%20TOC%20%0A%20%20%20%20is%20not%20printed%2C%20URLs%20are%20printed%20after%20hyperlinks%20in%20parentheses%2E%20%0A%20%20%20%20Block%20quotes%20are%20italicized%2E%20Tables%20are%20lightly%20styled%20with%20lines%20above%20%0A%20%20%20%20and%20below%20the%20table%20and%20below%20the%20header%20with%20a%20boldface%20header%2E%20Code%20%0A%20%20%20%20blocks%20are%20line%20wrapped%2E%20%0A%20%0A%20%20%20%20All%20elements%20that%20Pandoc%20and%20MultiMarkdown%20use%20should%20be%20listed%20here%2C%20even%20%0A%20%20%20%20if%20the%20style%20is%20empty%20so%20you%20can%20easily%20add%20styling%20to%20anything%2E%0A%20%20%20%20%0A%20%20%20%20There%20are%20some%20elements%20in%20here%20for%20HTML5%20output%20of%20Pandoc%2C%20but%20I%20have%20not%20%0A%20%20%20%20gotten%20around%20to%20testing%20that%20yet%2E%0A%2A%2F%0A%20%0A%2F%2A%20NOTES%3A%0A%20%0A%20%20%20%20Stuff%20tried%20and%20failed%3A%0A%20%20%20%20%0A%20%20%20%20It%20seems%20that%20specifying%20font%2Dfamily%3Aserif%20in%20Safari%20will%20always%20use%20%0A%20%20%20%20Times%20New%20Roman%20rather%20than%20the%20user%27s%20preferences%20setting%2E%0A%20%20%20%20%0A%20%20%20%20Making%20the%20font%20size%20different%20or%20a%20fixed%20value%20for%20print%20in%20case%20the%20screen%20%0A%20%20%20%20font%20size%20is%20making%20the%20print%20font%20too%20big%3A%20Making%20font%2Dsize%20different%20for%20%0A%20%20%20%20print%20than%20for%20screen%20causes%20horizontal%20lines%20to%20disappear%20in%20math%20when%20using%20%0A%20%20%20%20MathJax%20under%20Safari%2E%0A%2A%2F%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Front%20Matter%20%2D%2D%2D%2D%20%2A%2F%0A%20%0A%2F%2A%20Pandoc%20header%20DIV%2E%20Contains%20%2Etitle%2C%20%2Eauthor%20and%20%2Edate%2E%20Comes%20before%20div%23TOC%2E%20%0A%20%20%20Only%20appears%20if%20one%20of%20those%20three%20are%20in%20the%20document%2E%0A%2A%2F%0A%20%0Adiv%23header%2C%20header%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Put%20border%20on%20bottom%2E%20Separates%20it%20from%20TOC%20or%20body%20that%20comes%20after%20it%2E%20%2A%2F%0A%20%20%20%20border%2Dbottom%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%2Etitle%20%2F%2A%20Pandoc%20title%20header%20%28h1%2Etitle%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20%7D%0A%20%0A%2Eauthor%2C%20%2Edate%20%2F%2A%20Pandoc%20author%28s%29%20and%20date%20headers%20%28h2%2Eauthor%20and%20h3%2Edate%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20Pandoc%20table%20of%20contents%20DIV%20when%20using%20the%20%2D%2Dtoc%20option%2E%0A%20%20%20NOTE%3A%20this%20doesn%27t%20support%20Pandoc%27s%20%2D%2Did%2Dprefix%20option%20for%20%23TOC%20and%20%23header%2E%20%0A%20%20%20Probably%20would%20need%20to%20use%20div%5Bid%24%3D%27TOC%27%5D%20and%20div%5Bid%24%3D%27header%27%5D%20as%20selectors%2E%0A%2A%2F%0A%20%0Adiv%23TOC%2C%20nav%23TOC%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Put%20border%20on%20bottom%20to%20separate%20it%20from%20body%2E%20%2A%2F%0A%20%20%20%20border%2Dbottom%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20div%23TOC%2C%20nav%23TOC%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20Don%27t%20display%20TOC%20in%20print%20%2A%2F%0A%20%20%20%20%20%20%20%20display%3A%20none%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Headers%20and%20sections%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Ah1%2C%20h2%2C%20h3%2C%20h4%2C%20h5%2C%20h6%0A%7B%0A%20%20%20%20font%2Dfamily%3A%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20%22Liberation%20Sans%22%2C%20Calibri%2C%20Arial%2C%20sans%2Dserif%3B%20%2F%2A%20Sans%2Dserif%20headers%20%2A%2F%0A%20%0A%20%20%20%20%2F%2A%20font%2Dfamily%3A%20%22Liberation%20Serif%22%2C%20%22Georgia%22%2C%20%22Times%20New%20Roman%22%2C%20serif%3B%20%2F%2A%20Serif%20headers%20%2A%2F%0A%20%0A%20%20%20%20page%2Dbreak%2Dafter%3A%20avoid%3B%20%2F%2A%20Firefox%2C%20Chrome%2C%20and%20Safari%20do%20not%20support%20the%20property%20value%20%22avoid%22%20%2A%2F%0A%7D%0A%20%0A%2F%2A%20Pandoc%20with%20%2D%2Dsection%2Ddivs%20option%20%2A%2F%0A%20%0Adiv%20div%2C%20section%20section%20%2F%2A%20Nested%20sections%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dleft%3A%202em%3B%20%2F%2A%20This%20will%20increasingly%20indent%20nested%20header%20sections%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Ap%20%7B%7D%0A%20%0Ablockquote%0A%20%20%20%20%7B%20%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20%7D%0A%20%0Ali%20%2F%2A%20All%20list%20items%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ali%20%3E%20p%20%2F%2A%20Loosely%20spaced%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dtop%3A%201em%3B%20%2F%2A%20IE%3A%20lack%20of%20space%20above%20a%20%3Cli%3E%20when%20the%20item%20is%20inside%20a%20%3Cp%3E%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Aul%20%2F%2A%20Whole%20unordered%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aul%20li%20%2F%2A%20Unordered%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aol%20%2F%2A%20Whole%20ordered%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Aol%20li%20%2F%2A%20Ordered%20list%20item%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ahr%20%7B%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Some%20span%20elements%20%2D%2D%2D%20%2A%2F%0A%20%0Asub%20%2F%2A%20Subscripts%2E%20Pandoc%3A%20H%7E2%7EO%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Asup%20%2F%2A%20Superscripts%2E%20Pandoc%3A%20The%202%5End%5E%20try%2E%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aem%20%2F%2A%20Emphasis%2E%20Markdown%3A%20%2Aemphasis%2A%20or%20%5Femphasis%5F%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aem%20%3E%20em%20%2F%2A%20Emphasis%20within%20emphasis%3A%20%2AThis%20is%20all%20%2Aemphasized%2A%20except%20that%2A%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dstyle%3A%20normal%3B%0A%20%20%20%20%7D%0A%20%0Astrong%20%2F%2A%20Markdown%20%2A%2Astrong%2A%2A%20or%20%5F%5Fstrong%5F%5F%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Links%20%28anchors%29%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aa%20%2F%2A%20All%20links%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Keep%20links%20clean%2E%20On%20screen%2C%20they%20are%20colored%3B%20in%20print%2C%20they%20do%20nothing%20anyway%2E%20%2A%2F%0A%20%20%20%20text%2Ddecoration%3A%20none%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20screen%0A%20%20%20%20%7B%0A%20%20%20%20a%3Ahover%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20On%20hover%2C%20we%20indicate%20a%20bit%20more%20that%20it%20is%20a%20link%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20text%2Ddecoration%3A%20underline%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20a%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20In%20print%2C%20a%20colored%20link%20is%20useless%2C%20so%20un%2Dstyle%20it%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20color%3A%20black%3B%0A%20%20%20%20%20%20%20%20background%3A%20transparent%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%0A%20%20%20%20a%5Bhref%5E%3D%22http%3A%2F%2F%22%5D%3Aafter%2C%20a%5Bhref%5E%3D%22https%3A%2F%2F%22%5D%3Aafter%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20However%2C%20links%20that%20go%20somewhere%20else%2C%20might%20be%20useful%20to%20the%20reader%2C%0A%20%20%20%20%20%20%20%20%20%20%20so%20for%20http%20and%20https%20links%2C%20print%20the%20URL%20after%20what%20was%20the%20link%20%0A%20%20%20%20%20%20%20%20%20%20%20text%20in%20parens%0A%20%20%20%20%20%20%20%20%2A%2F%0A%20%20%20%20%20%20%20%20content%3A%20%22%20%28%22%20attr%28href%29%20%22%29%20%22%3B%0A%20%20%20%20%20%20%20%20font%2Dsize%3A%2090%25%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Images%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aimg%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Let%20it%20be%20inline%20left%2Fright%20where%20it%20wants%20to%20be%2C%20but%20verticality%20make%20%0A%20%20%20%20%20%20%20it%20in%20the%20middle%20to%20look%20nicer%2C%20but%20opinions%20differ%2C%20and%20if%20in%20a%20multi%2Dline%20%0A%20%20%20%20%20%20%20paragraph%2C%20it%20might%20not%20be%20so%20great%2E%20%0A%20%20%20%20%2A%2F%0A%20%20%20%20vertical%2Dalign%3A%20middle%3B%0A%20%20%20%20%7D%0A%20%0Adiv%2Efigure%20%2F%2A%20Pandoc%20figure%2Dstyle%20image%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Center%20the%20image%20and%20caption%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20auto%3B%0A%20%20%20%20margin%2Dright%3A%20auto%3B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20%7D%0A%20%0Ap%2Ecaption%20%2F%2A%20Pandoc%20figure%2Dstyle%20caption%20within%20div%2Efigure%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Inherits%20div%2Efigure%20props%20by%20default%20%2A%2F%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Code%20blocks%20and%20spans%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Apre%2C%20code%20%0A%20%20%20%20%7B%0A%20%20%20%20background%2Dcolor%3A%20%23fdf7ee%3B%0A%20%20%20%20%2F%2A%20BEGIN%20word%20wrap%20%2A%2F%0A%20%20%20%20%2F%2A%20Need%20all%20the%20following%20to%20word%20wrap%20instead%20of%20scroll%20box%20%2A%2F%0A%20%20%20%20%2F%2A%20This%20will%20override%20the%20overflow%3Aauto%20if%20present%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20pre%2Dwrap%3B%20%2F%2A%20css%2D3%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Dmoz%2Dpre%2Dwrap%20%21important%3B%20%2F%2A%20Mozilla%2C%20since%201999%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Dpre%2Dwrap%3B%20%2F%2A%20Opera%204%2D6%20%2A%2F%0A%20%20%20%20white%2Dspace%3A%20%2Do%2Dpre%2Dwrap%3B%20%2F%2A%20Opera%207%20%2A%2F%0A%20%20%20%20word%2Dwrap%3A%20break%2Dword%3B%20%2F%2A%20Internet%20Explorer%205%2E5%2B%20%2A%2F%0A%20%20%20%20%2F%2A%20END%20word%20wrap%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Apre%20%2F%2A%20Code%20blocks%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Distinguish%20pre%20blocks%20from%20other%20text%20by%20more%20than%20the%20font%20with%20a%20background%20tint%2E%20%2A%2F%0A%20%20%20%20padding%3A%200%2E5em%3B%20%2F%2A%20Since%20we%20have%20a%20background%20color%20%2A%2F%0A%20%20%20%20border%2Dradius%3A%205px%3B%20%2F%2A%20Softens%20it%20%2A%2F%0A%20%20%20%20%2F%2A%20Give%20it%20a%20some%20definition%20%2A%2F%0A%20%20%20%20border%3A%201px%20solid%20%23aaa%3B%0A%20%20%20%20%2F%2A%20Set%20it%20off%20left%20and%20right%2C%20seems%20to%20look%20a%20bit%20nicer%20when%20we%20have%20a%20background%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20%200%2E5em%3B%0A%20%20%20%20margin%2Dright%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0A%40media%20screen%0A%20%20%20%20%7B%0A%20%20%20%20pre%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20On%20screen%2C%20use%20an%20auto%20scroll%20box%20for%20long%20lines%2C%20unless%20word%2Dwrap%20is%20enabled%20%2A%2F%0A%20%20%20%20%20%20%20%20white%2Dspace%3A%20pre%3B%0A%20%20%20%20%20%20%20%20overflow%3A%20auto%3B%0A%20%20%20%20%20%20%20%20%2F%2A%20Dotted%20looks%20better%20on%20screen%20and%20solid%20seems%20to%20print%20better%2E%20%2A%2F%0A%20%20%20%20%20%20%20%20border%3A%201px%20dotted%20%23777%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%0Acode%20%2F%2A%20All%20inline%20code%20spans%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%2F%2A%20Code%20spans%20in%20paragraphs%20and%20tight%20lists%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Pad%20a%20little%20from%20adjacent%20text%20%2A%2F%0A%20%20%20%20padding%2Dleft%3A%20%202px%3B%0A%20%20%20%20padding%2Dright%3A%202px%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Ali%20%3E%20p%20code%20%2F%2A%20Code%20span%20in%20a%20loose%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20We%20have%20room%20for%20some%20more%20background%20color%20above%20and%20below%20%2A%2F%0A%20%20%20%20padding%3A%202px%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Math%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aspan%2Emath%20%2F%2A%20Pandoc%20inline%20math%20default%20and%20%2D%2Djsmath%20inline%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%2F%2A%20Tried%20font%2Dstyle%3Aitalic%20here%2C%20and%20it%20messed%20up%20MathJax%20rendering%20in%20some%20browsers%2E%20Maybe%20don%27t%20mess%20with%20at%20all%2E%20%2A%2F%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Emath%20%2F%2A%20Pandoc%20%2D%2Djsmath%20display%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Aspan%2ELaTeX%20%2F%2A%20Pandoc%20%2D%2Dlatexmathml%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0Aeq%20%2F%2A%20Pandoc%20%2D%2Dgladtex%20math%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Tables%20%2D%2D%2D%2D%20%2A%2F%0A%20%0A%2F%2A%20%20A%20clean%20textbook%2Dlike%20style%20with%20horizontal%20lines%20above%20and%20below%20and%20under%20%0A%20%20%20%20the%20header%2E%20Rows%20highlight%20on%20hover%20to%20help%20scanning%20the%20table%20on%20screen%2E%0A%2A%2F%0A%20%0Atable%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dcollapse%3A%20collapse%3B%0A%20%20%20%20border%2Dspacing%3A%200%3B%20%2F%2A%20IE%206%20%2A%2F%0A%20%0A%20%20%20%20border%2Dbottom%3A%202pt%20solid%20%23000%3B%0A%20%20%20%20border%2Dtop%3A%202pt%20solid%20%23000%3B%20%2F%2A%20The%20caption%20on%20top%20will%20not%20have%20a%20bottom%2Dborder%20%2A%2F%0A%20%0A%20%20%20%20%2F%2A%20Center%20%2A%2F%0A%20%20%20%20margin%2Dleft%3A%20auto%3B%0A%20%20%20%20margin%2Dright%3A%20auto%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Athead%20%2F%2A%20Entire%20table%20header%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dbottom%3A%201pt%20solid%20%23000%3B%0A%20%20%20%20background%2Dcolor%3A%20%23eee%3B%20%2F%2A%20Does%20this%20BG%20print%20well%3F%20%2A%2F%0A%20%20%20%20%7D%0A%20%0Atr%2Eheader%20%2F%2A%20Each%20header%20row%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%20%0A%20%0Atbody%20%2F%2A%20Entire%20table%20%20body%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20Table%20body%20rows%20%2A%2F%0A%20%0Atr%20%20%7B%0A%20%20%20%20%7D%0Atr%2Eodd%3Ahover%2C%20tr%2Eeven%3Ahover%20%2F%2A%20Use%20%2Eodd%20and%20%2Eeven%20classes%20to%20avoid%20styling%20rows%20in%20other%20tables%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20background%2Dcolor%3A%20%23eee%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20Odd%20and%20even%20rows%20%2A%2F%0Atr%2Eodd%20%7B%7D%0Atr%2Eeven%20%7B%7D%0A%20%0Atd%2C%20th%20%2F%2A%20Table%20cells%20and%20table%20header%20cells%20%2A%2F%0A%20%20%20%20%7B%20%0A%20%20%20%20vertical%2Dalign%3A%20top%3B%20%2F%2A%20Word%20%2A%2F%0A%20%20%20%20vertical%2Dalign%3A%20baseline%3B%20%2F%2A%20Others%20%2A%2F%0A%20%20%20%20padding%2Dleft%3A%20%20%200%2E5em%3B%0A%20%20%20%20padding%2Dright%3A%20%200%2E5em%3B%0A%20%20%20%20padding%2Dtop%3A%20%20%20%200%2E2em%3B%0A%20%20%20%20padding%2Dbottom%3A%200%2E2em%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20Removes%20padding%20on%20left%20and%20right%20of%20table%20for%20a%20tight%20look%2E%20Good%20if%20thead%20has%20no%20background%20color%2A%2F%0A%2F%2A%0Atr%20td%3Alast%2Dchild%2C%20tr%20th%3Alast%2Dchild%0A%20%20%20%20%7B%0A%20%20%20%20padding%2Dright%3A%200%3B%0A%20%20%20%20%7D%0Atr%20td%3Afirst%2Dchild%2C%20tr%20th%3Afirst%2Dchild%20%0A%20%20%20%20%7B%0A%20%20%20%20padding%2Dleft%3A%200%3B%0A%20%20%20%20%7D%0A%2A%2F%0A%20%0Ath%20%2F%2A%20Table%20header%20cells%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%20%0A%20%20%20%20%7D%0A%20%0Atfoot%20%2F%2A%20Table%20footer%20%28what%20appears%20here%20if%20caption%20is%20on%20top%3F%29%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0Acaption%20%2F%2A%20This%20is%20for%20a%20table%20caption%20tag%2C%20not%20the%20p%2Ecaption%20Pandoc%20uses%20in%20a%20div%2Efigure%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20caption%2Dside%3A%20top%3B%0A%20%20%20%20border%3A%20none%3B%0A%20%20%20%20font%2Dsize%3A%200%2E9em%3B%0A%20%20%20%20font%2Dstyle%3A%20italic%3B%0A%20%20%20%20text%2Dalign%3A%20center%3B%0A%20%20%20%20margin%2Dbottom%3A%200%2E3em%3B%20%2F%2A%20Good%20for%20when%20on%20top%20%2A%2F%0A%20%20%20%20padding%2Dbottom%3A%200%2E2em%3B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20%2D%2D%2D%2D%20Definition%20lists%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Adl%20%2F%2A%20The%20whole%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%202pt%20solid%20black%3B%0A%20%20%20%20padding%2Dtop%3A%200%2E5em%3B%0A%20%20%20%20border%2Dbottom%3A%202pt%20solid%20black%3B%0A%20%20%20%20%7D%0A%20%0Adt%20%2F%2A%20Definition%20term%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20font%2Dweight%3A%20bold%3B%0A%20%20%20%20%7D%0A%20%0Add%2Bdt%20%2F%2A%202nd%20or%20greater%20term%20in%20the%20list%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%201pt%20solid%20black%3B%0A%20%20%20%20padding%2Dtop%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%20%20%20%0Add%20%2F%2A%20A%20definition%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20margin%2Dbottom%3A%200%2E5em%3B%0A%20%20%20%20%7D%0A%20%0Add%2Bdd%20%2F%2A%202nd%20or%20greater%20definition%20of%20a%20term%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20border%2Dtop%3A%201px%20solid%20black%3B%20%2F%2A%20To%20separate%20multiple%20definitions%20%2A%2F%0A%20%20%20%20%7D%0A%20%20%20%20%0A%2F%2A%20%2D%2D%2D%2D%20Footnotes%20%2D%2D%2D%2D%20%2A%2F%0A%20%0Aa%2Efootnote%2C%20a%2EfootnoteRef%20%7B%20%2F%2A%20Pandoc%2C%20MultiMarkdown%20footnote%20links%20%2A%2F%0A%20%20%20%20font%2Dsize%3A%20small%3B%20%0A%20%20%20%20vertical%2Dalign%3A%20text%2Dtop%3B%0A%7D%0A%20%0Aa%5Bhref%5E%3D%22%23fnref%22%5D%2C%20a%2Ereversefootnote%20%2F%2A%20Pandoc%2C%20MultiMarkdown%2C%20%3F%3F%20footnote%20back%20links%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20a%5Bhref%5E%3D%22%23fnref%22%5D%2C%20a%2Ereversefootnote%20%2F%2A%20Pandoc%2C%20MultiMarkdown%20%2A%2F%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%2F%2A%20Don%27t%20display%20these%20at%20all%20in%20print%20since%20the%20arrow%20is%20only%20something%20to%20click%20on%20%2A%2F%0A%20%20%20%20%20%20%20%20display%3A%20none%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Efootnotes%20%2F%2A%20Pandoc%20footnotes%20div%20at%20end%20of%20the%20document%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%20%20%20%0Adiv%2Efootnotes%20li%5Bid%5E%3D%22fn%22%5D%20%2F%2A%20A%20footnote%20item%20within%20that%20div%20%2A%2F%0A%20%20%20%20%7B%0A%20%20%20%20%7D%0A%20%0A%2F%2A%20You%20can%20class%20stuff%20as%20%22noprint%22%20to%20not%20print%2E%20%0A%20%20%20Useful%20since%20you%20can%27t%20set%20this%20media%20conditional%20inside%20an%20HTML%20element%27s%20%0A%20%20%20style%20attribute%20%28I%20think%29%2C%20and%20you%20don%27t%20want%20to%20make%20another%20stylesheet%20that%20%0A%20%20%20imports%20this%20one%20and%20adds%20a%20class%20just%20to%20do%20this%2E%0A%2A%2F%0A%20%0A%40media%20print%0A%20%20%20%20%7B%0A%20%20%20%20%2Enoprint%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20display%3Anone%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A" rel="stylesheet" type="text/css" />
10</head>
11<body>
12<div id="header">
13<h1 class="title">Ganeti: failures and recovery scenarios</h1>
14</div>
15<div id="TOC">
16<ul>
17<li><a href="#initial-setup"><span class="toc-section-number">1</span> Initial setup</a></li>
18<li><a href="#scenario-planned-node-maintenance"><span class="toc-section-number">2</span> Scenario: Planned Node Maintenance</a><ul>
19<li><a href="#gnt-node-evacuate"><span class="toc-section-number">2.1</span> <code>gnt-node evacuate</code></a></li>
20</ul></li>
21<li><a href="#scenario-loss-of-a-slave-node"><span class="toc-section-number">3</span> Scenario: Loss of a Slave Node</a><ul>
22<li><a href="#loss-of-network-connectivity"><span class="toc-section-number">3.1</span> Loss of network connectivity</a><ul>
23<li><a href="#initial-state"><span class="toc-section-number">3.1.1</span> Initial state</a></li>
24<li><a href="#instance-recovery"><span class="toc-section-number">3.1.2</span> Instance recovery</a></li>
25<li><a href="#re-adding-the-failed-node"><span class="toc-section-number">3.1.3</span> Re-adding the failed node</a></li>
26</ul></li>
27<li><a href="#alternate-decisions"><span class="toc-section-number">3.2</span> Alternate decisions</a><ul>
28<li><a href="#completely-removing-hostb-from-the-cluster"><span class="toc-section-number">3.2.1</span> Completely removing hostB from the cluster</a></li>
29</ul></li>
30</ul></li>
31<li><a href="#scenario-planned-master-failover-node-maintenance"><span class="toc-section-number">4</span> Scenario: Planned master failover (node maintenance)</a></li>
32<li><a href="#scenario-loss-of-master-node"><span class="toc-section-number">5</span> Scenario: Loss of Master Node</a><ul>
33<li><a href="#promoting-slave"><span class="toc-section-number">5.1</span> Promoting slave</a></li>
34</ul></li>
35</ul>
36</div>
37<p>We are going to simulate a number of failure situations, and recover from them.</p>
38<p>Try and replicate the scenarios on your hosts.</p>
39<h1 id="initial-setup"><a href="#initial-setup"><span class="header-section-number">1</span> Initial setup</a></h1>
40<ul>
41<li>Cluster with 3 or more Nodes</li>
42<li>Master is up (hostA)</li>
43<li>Slaves are up (hostB, hostC, etc.)</li>
44<li>DRBD instance &quot;debianX&quot; is running on one of the Slaves (let's say hostB) and is replicated to either the Master (hostA) or another Slave (hostC or other)</li>
45</ul>
46<h1 id="scenario-planned-node-maintenance"><a href="#scenario-planned-node-maintenance"><span class="header-section-number">2</span> Scenario: Planned Node Maintenance</a></h1>
47<p>Let's imagine that we want to take down hostB for maintenance: more RAM, a disk replacement, etc.</p>
48<p>You have probably many instances running on your cluster by now.</p>
49<p>We can't simply do a migrate: it only switches primary and secondary around. The means that hostB will still be secondary for a number of instances.</p>
50<p>Another issue is that you may have <code>plain</code> instances. If you shut down hostB, those instances will be shut down as well!</p>
51<p>What do we do ? Since we have a third node (hostC), we can use that to move/copy instances away from hostB:</p>
52<ul>
53<li><p>DRBD instances for which hostB is primary will need to migrate to their secondary</p></li>
54<li><p>Once this is done, and hostB is only secondary for the instances, we'll need to move the disks of all DRBD instances to another node. (example: if A is primary for vmX, we move secondary disks from B to C)</p></li>
55<li><p>Plain instances running on hostB will need to be moved to another node (A or C)</p></li>
56</ul>
57<p>Luckily, we have a couple of commands to help us do our work!</p>
58<h2 id="gnt-node-evacuate"><a href="#gnt-node-evacuate"><span class="header-section-number">2.1</span> <code>gnt-node evacuate</code></a></h2>
59<p>Read the man page for <code>gnt-node</code> and look for the section about the <code>evacuate</code> subcommand.</p>
60<p><code>gnt-node evacuate</code> will move DRBD instances away from a node. You will need to run this command for primary instances (<code>-p</code>) and for secondary instances (<code>-s</code>).</p>
61<ul>
62<li>We have debianX running as a DRBD instance on hostA (primary)</li>
63<li><p>hostB (secondary).</p></li>
64<li><p>We have debianY running as a plain instance on host B</p></li>
65</ul>
66<p>What happens if we do:</p>
67<pre><code># gnt-node evacuate -p nodeB
68
69Relocate instance(s) debianY from node(s) nodeB?
70y/[n]/?:</code></pre>
71<p><code>gnt-node evacuate</code> has figured out that the <code>plain</code> debianY instance needs to be moved away. Answer <code>y</code></p>
72<pre><code>Fri Sep 19 14:29:45 2014  - INFO: Evacuating instances from node 'hostB': debianY
73Fri Sep 19 14:29:46 2014  - WARNING: Unable to evacuate instances debianY (Instances of type plain cannot be relocated)
74Failure: command execution error:
75Unable to evacuate instances debianY (Instances of type plain cannot be relocated)</code></pre>
76<p>Uh oh :(</p>
77<p>Ok, we will need to <code>move</code> the instance manually :(</p>
78<pre><code># gnt-instance move -n hostB debianY
79
80Instance debianY will be moved. This requires a shutdown of the instance.
81Continue?
82y/[n]/?: y
83Fri Sep 19 14:31:44 2014  - INFO: Shutting down instance debianY on source node hostB
84Fri Sep 19 14:32:01 2014 disk/0 sent 450M, 77.2 MiB/s, 21%, ETA 21s
85Fri Sep 19 14:32:37 2014 disk/0 finished receiving data
86Fri Sep 19 14:32:37 2014 disk/0 finished sending data
87Fri Sep 19 14:32:37 2014  - INFO: Removing the disks on the original node
88Fri Sep 19 14:32:38 2014  - INFO: Starting instance debianY on node hostC
89</code></pre>
90<ul>
91<li>Primary instances' disks will be to other nodes in the cluster.</li>
92</ul>
93<p>In our case, we want to evacuate all</p>
94<p>Note: for the time being, one needs to explicitly tell the evacuate command to move away either primary (<code>-p</code>) or secondary (<code>-s</code>) instances - it won't work for both at the same time.</p>
95<p>Since hostB was primary for our debianX instance, we tell evacuate to only evacuate primary instances (for the time being):</p>
96<pre><code># gnt-node evacuate -p hostB
97
98Relocate instance(s) debianX from node(s) hostB?
99y/[n]/?:</code></pre>
100<h1 id="scenario-loss-of-a-slave-node"><a href="#scenario-loss-of-a-slave-node"><span class="header-section-number">3</span> Scenario: Loss of a Slave Node</a></h1>
101<h2 id="loss-of-network-connectivity"><a href="#loss-of-network-connectivity"><span class="header-section-number">3.1</span> Loss of network connectivity</a></h2>
102<h3 id="initial-state"><a href="#initial-state"><span class="header-section-number">3.1.1</span> Initial state</a></h3>
103<ul>
104<li>Confirm that <code>debianX</code> (or what the name of the DRBD VM you are using is) is running on hostB (gnt-instance list)</li>
105</ul>
106<pre><code># gnt-instance list -o name,pnode,snodes,status</code></pre>
107<pre><code>Instance Primary_node         Secondary_Nodes      Status
108debianX  hostB.ws.nsrc.org    hostA.ws.nsrc.org    running</code></pre>
109<ul>
110<li>Shut down (halt) hostB, the (<em>make sure you run this on hostB, the primary node for this instance!</em>)</li>
111</ul>
112<pre><code># halt -p</code></pre>
113<ul>
114<li>The VM goes down as a result (confirm this using ping / console)</li>
115</ul>
116<pre><code># gnt-instance list -o name,pnode,snodes,status
117
118Instance Primary_node         Secondary_Nodes      Status
119debianX  hostB.ws.nsrc.org    hostA.ws.nsrc.org    ERROR_nodedown</code></pre>
120<ul>
121<li><p>Run <code>gnt-cluster verify</code> (will take a while), and look at the output.</p></li>
122<li><p>Run <code>gnt-node list</code>, and look at the output, too.</p></li>
123</ul>
124<p>As you notice, things are quite slow. This is because Ganeti is trying to contact the <code>gnt-noded</code> daemon on hostB, and it's timing out.</p>
125<p>If this were a production environment, we'd have to examine hostB, and determine whether hostB was likely to come back online soon. If not, say, because of some hardware failure, we would decide to take the node &quot;offline&quot;, so Ganeti would stop trying to talk to it.</p>
126<p>Let's start by marking hostB as offline:</p>
127<pre><code># gnt-node modify --offline=yes hostB.ws.nsrc.org
128
129Modified node hostB.ws.nsrc.org
130 - master_candidate -&gt; False
131 - offline -&gt; True</code></pre>
132<p>It will take a little while, but now most commands will run faster as Ganeti stops trying to contact the other nodes in the cluster.</p>
133<p>Try running <code>gnt-instance list</code> and <code>gnt-node list</code> again.</p>
134<p>Also re-run <code>gnt-cluster verify</code></p>
135<h3 id="instance-recovery"><a href="#instance-recovery"><span class="header-section-number">3.1.2</span> Instance recovery</a></h3>
136<ul>
137<li>We cannot live-migrate the host (hostB is down), so we need to <em>failover</em></li>
138</ul>
139<p>If you attempt to migrate, you will be told:</p>
140<pre><code># gnt-instance migrate debianX
141
142Failure: prerequisites not met for this operation:
143error type: wrong_state, error details:
144Can't migrate, please use failover: Node is marked offline</code></pre>
145<ul>
146<li>Attempt failover</li>
147</ul>
148<pre><code># gnt-instance failover debianX</code></pre>
149<p>Hopefully you will see messages ending with:</p>
150<pre><code>...
151Sat Jan 18 15:58:11 2014 * activating the instance's disks on target node hostA.ws.nsrc.org
152Sat Jan 18 15:58:11 2014  - WARNING: Could not prepare block device disk/0 on node hostB.ws.nsrc.org (is_primary=False, pass=1): Node is marked offline
153Sat Jan 18 15:58:11 2014 * starting the instance on the target node hostA.ws.nsrc.org</code></pre>
154<p>If so, skip to the section &quot;Confirm that the VM is now up on hostA&quot;</p>
155<p>If you see this message:</p>
156<pre><code>Sat Jan 18 20:57:55 2014 Failover instance debianX
157Sat Jan 18 20:57:55 2014 * checking disk consistency between source and target
158Failure: command execution error:
159Disk 0 is degraded on target node, aborting failover</code></pre>
160<p>... you will need to <em>force</em> the operation. This should normally not happen when the node is marked offline. However, if you do get the message:</p>
161<ul>
162<li>Read man page on <code>gnt-instance</code>, find the section about <code>failover</code>:</li>
163</ul>
164<blockquote>
165<p>If you are trying to migrate instances off a dead node, this will fail. Use the --ignore-consistency option for this purpose. Note that this option can be dangerous as errors in shutting down the instance will be ignored, resulting in possibly having the instance running on two machines in parallel (on disconnected DRBD drives).</p>
166</blockquote>
167<ul>
168<li><p>This is why we shut down hostB, and didn't simply disconnect. You MUST verify that hostB really is down, and not simply disconnected from the management / replication network, otherwise you risk ending up with two running instances of <code>VM</code> (if someone force starts it) and you will need to force a resolution.</p></li>
169<li><p>Re-run <code>gnt-instance failover</code> with the '--ignore-consistency' flag. We are in a situation that requires this (hostB down)</p></li>
170</ul>
171<pre><code># gnt-instance failover --ignore-consistency debianX</code></pre>
172<p>There will be much more output this time, pay attention in particular if you see some warnings - these are normal since the hostB node is down, but we did it mark it as offline.</p>
173<pre><code>Sat Jan 18 21:03:15 2014 Failover instance debianX
174Sat Jan 18 21:03:15 2014 * checking disk consistency between source and target
175
176[ ... messages ... ]
177
178Sat Jan 18 21:03:27 2014 * activating the instance's disks on target node hostA.ws.nsrc.org
179
180[ ... messages ... ]
181
182Sat Jan 18 21:03:33 2014 * starting the instance on the target node hostA.ws.nsrc.org</code></pre>
183<ul>
184<li>Confirm that the VM is now up on hostA:</li>
185</ul>
186<pre><code># gnt-instance list -o name,pnode,snodes,status
187
188Instance Primary_node         Secondary_Nodes      Status
189debianX  hostA.ws.nsrc.org    hostB.ws.nsrc.org    running</code></pre>
190<h3 id="re-adding-the-failed-node"><a href="#re-adding-the-failed-node"><span class="header-section-number">3.1.3</span> Re-adding the failed node</a></h3>
191<p>Ok, let's say hostB has been fixed.</p>
192<ul>
193<li><p>Restart hostB. (Depending on the class setup, you may need to ask the instructor to do this for you).</p></li>
194<li><p>Make sure you can ping it and can log in to it</p></li>
195</ul>
196<p>We need to re-add it to the cluster. We do this using the <code>gnt-node add --readd</code> command on the cluster master node.</p>
197<p>From the <code>gnt-node</code> man page:</p>
198<blockquote>
199<p>In case you're readding a node after hardware failure, you can use the --readd parameter. In this case, you don't need to pass the secondary IP again, it will reused from the cluster. Also, the drained and offline flags of the node will be cleared before re-adding it.</p>
200</blockquote>
201<pre><code># gnt-node add --readd hostB.ws.nsrc.org
202
203[ ... question about SSH ...]
204
205Sat Jan 18 22:09:43 2014  - INFO: Readding a node, the offline/drained flags were reset
206Sat Jan 18 22:09:43 2014  - INFO: Node will be a master candidate</code></pre>
207<p>We're good! It could take a while to re-sync the DRBD data if a lot of disk activity (writing) has taken place on <code>debianX</code>, but this will happen in the background.</p>
208<p>Inspect the node list:</p>
209<pre><code># gnt-node list</code></pre>
210<p>Check the cluster configuration.</p>
211<pre><code># gnt-cluster verify</code></pre>
212<p>Probably the DRBD instances on hostB have not yet been activated by the. As a result you may see some errors about your instance's disk beging degraded, similar to this:</p>
213<pre><code>Thu Sep 18 18:52:41 2014 * Verifying node status
214Thu Sep 18 18:52:41 2014   - ERROR: node hostB: drbd minor 0 of instance debianX is not active
215Thu Sep 18 18:52:41 2014 * Verifying instance status
216Thu Sep 18 18:52:41 2014   - ERROR: instance debianX: disk/0 on hostA is degraded
217Thu Sep 18 18:52:41 2014   - ERROR: instance debianX: couldn't retrieve status for disk/0 on hostB: Can't find device &lt;DRBD8(hosts=03add4b7-d6d9-40d0-bf6e-74d1683aad49/0-93eef5d9-6b33-4c</code></pre>
218<p>Don't panic! This is normal, as it's possible the disks haven't been re-synchronized yet.</p>
219<p>If so, you can use the command <code>gnt-cluster verify-disks</code> to fix this:</p>
220<pre><code># gnt-cluster verify-disks
221
222Submitted jobs 78
223Waiting for job 78 ...
224Activating disks for instance 'debianX'</code></pre>
225<p>Wait a few seconds, then run:</p>
226<pre><code># gnt-cluster verify</code></pre>
227<p>When all is OK, let's try and migrate debianX back to hostB:</p>
228<pre><code># gnt-instance migrate debianX</code></pre>
229<p>Test that the migration has worked.</p>
230<p>Note: if you are certain that the node <code>hostB</code> is healthy (let's say it was just a power failure, and no corruption has happened on its filesystem or disks), you could simply do the following (DON'T DO THIS NOW!):</p>
231<pre><code># gnt-node modify -O no hostB.ws.nsrc.org
232
233Sat Jan 18 22:08:45 2014  - INFO: Auto-promoting node to master candidate
234Sat Jan 18 22:08:45 2014  - WARNING: Transitioning node from offline to online state without using re-add. Please make sure the node is healthy!</code></pre>
235<p>But you would be warned about this.</p>
236<h2 id="alternate-decisions"><a href="#alternate-decisions"><span class="header-section-number">3.2</span> Alternate decisions</a></h2>
237<h3 id="completely-removing-hostb-from-the-cluster"><a href="#completely-removing-hostb-from-the-cluster"><span class="header-section-number">3.2.1</span> Completely removing hostB from the cluster</a></h3>
238<p>Let's now imagine that the failure of hostB wasn't temporary: we imagine that cannot be fixed, and won't be back online for a while (it needs to be completely replaced). We could decide to remove hostB from the cluster.</p>
239<p>To do this:</p>
240<ul>
241<li>If hostB has been restarted, let's shut it down (to simulate a failure)</li>
242</ul>
243<p>Note: RUN THIS ON hostB !!!</p>
244<pre><code># halt -p</code></pre>
245<ul>
246<li>On the master:</li>
247</ul>
248<p>Mark hostB as offline:</p>
249<pre><code># gnt-node modify --offline=yes hostB.ws.nsrc.org</code></pre>
250<p>run <code>gnt-cluster verify</code>, and look at the output.</p>
251<pre><code>Sat Jan 18 21:31:56 2014   - NOTICE: 1 offline node(s) found.</code></pre>
252<ul>
253<li><p>We marked hostB as down - let's assume hostB will be down for a while while it's being fixed.</p></li>
254<li><p>We decide to remove hostB from the cluster:</p></li>
255</ul>
256<pre><code># gnt-node remove hostB.ws.nsrc.org
257
258Failure: prerequisites not met for this operation:
259error type: wrong_input, error details:
260Instance debianX is still running on the node, please remove first</code></pre>
261<p>Ok, we are not allowed to remove the hostB, because Ganeti can see that we still have an instance (debianX) associated with hostB.</p>
262<p>This is different from simply marking the node offline, as it means we are permanently getting rid of hostB, and we need to take a decision about what to do for DRBD instances that were associated with hostB.</p>
263<!-- XXX: gnt-node failover goes here -->
264
265<pre><code># gnt-instance failover debianX
266
267Failover will happen to image debianX. This requires a shutdown of
268the instance. Continue?
269y/[n]/?: y
270Thu Sep 18 20:29:32 2014 Failover instance debianX
271Thu Sep 18 20:29:32 2014 * checking disk consistency between source and target
272Thu Sep 18 20:29:32 2014 Node hostB.ws.nsrc.org is offline, ignoring degraded disk 0 on target node hostA.ws.nsrc.org
273Thu Sep 18 20:29:32 2014 * shutting down instance on source node
274Thu Sep 18 20:29:32 2014  - WARNING: Could not shutdown instance debianX on node hostB.ws.nsrc.org, proceeding anyway; please make sure node hostB.ws.nsrc.org is down; error details: Node is marked offline
275Thu Sep 18 20:29:32 2014 * deactivating the instance's disks on source node
276Thu Sep 18 20:29:33 2014  - WARNING: Could not shutdown block device disk/0 on node hostB.ws.nsrc.org: Node is marked offline
277Thu Sep 18 20:29:33 2014 * activating the instance's disks on target node hostA.ws.nsrc.org
278Thu Sep 18 20:29:33 2014  - WARNING: Could not prepare block device disk/0 on node hostB.ws.nsrc.org (is_primary=False, pass=1): Node is marked offline
279Thu Sep 18 20:29:33 2014 * starting the instance on the target node hostA.ws.nsrc.org</code></pre>
280<p>Followed by:</p>
281<!-- XXX maybe we need to do a replace-disks instead -->
282
283<pre><code># gnt-node evacuate -s hostB
284
285Relocate instance(s) debianX from node(s) hostB?
286y/[n]/?: y
287Thu Sep 18 20:32:37 2014  - INFO: Evacuating instances from node 'hostB.ws.nsrc.org': debianX
288Thu Sep 18 20:32:37 2014  - INFO: Instances to be moved: debianX (to hostA.ws.nsrc.org, hostC.ws.nsrc.org)
289...
290Thu Sep 18 20:32:38 2014 STEP 3/6 Allocate new storage
291Thu Sep 18 20:32:38 2014  - INFO: Adding new local storage on hostC.ws.nsrc.org for disk/0
292...
293Thu Sep 18 20:32:41 2014 STEP 6/6 Sync devices
294Thu Sep 18 20:32:41 2014  - INFO: Waiting for instance debianX to sync disks
295Thu Sep 18 20:32:41 2014  - INFO: - device disk/0:  1.20% done, 1m 55s remaining (estimated)
296Thu Sep 18 20:33:41 2014  - INFO: Instance debianX's disks are in sync
297All instances evacuated successfully.</code></pre>
298<p>Ok, check out the instance list:</p>
299<pre><code># gnt-instance list -o name,pnode,snodes,status
300
301Instance  Primary_node      Secondary_Nodes Status
302debianX   hostA.ws.nsrc.org hostC.ws.nsrc.org  running</code></pre>
303<p>Perfect, hostB is not used by any instance. We can now re-attempt to remove node hostB from the cluster:</p>
304<pre><code># gnt-node remove hostB.ws.nsrc.org</code></pre>
305<p>More WARNINGs! But did it work ?</p>
306<pre><code># gnt-node list
307
308Node              DTotal DFree MTotal MNode MFree Pinst Sinst
309hostA.ws.nsrc.org  29.1G 12.6G   995M  145M  672M     2     0
310hostC.ws.nsrc.org  29.0G 12.7G   995M  137M  680M     0     1</code></pre>
311<p>Yes, hostB is gone.</p>
312<p>Note: Ganeti will modify <code>/etc/hosts</code> on your remaining nodes, and remove the line for hostB!</p>
313<p>We can restart our debianX instance, by the way! (This may have already happened if you called <code>gnt-instance failover</code>)</p>
314<pre><code># gnt-instance start debianX</code></pre>
315<p>Test that it comes up normally.</p>
316<h1 id="scenario-planned-master-failover-node-maintenance"><a href="#scenario-planned-master-failover-node-maintenance"><span class="header-section-number">4</span> Scenario: Planned master failover (node maintenance)</a></h1>
317<p>Let's imagine that we need to temporarily service the cluster master (in this case, nodeA). It's rather easy. Decide first which of the other nodes will become master.</p>
318<p>Read about <code>master-failover</code>: <code>man gnt-cluster</code>, find the MASTER-FAILOVER section.</p>
319<p>Then, ON THE NODE YOU PICKED, run this command:</p>
320<pre><code># gnt-cluster master-failover</code></pre>
321<p>If everything goes well, after 5-10 seconds, the node you ran this command on is now the new master.</p>
322<p>Test this! For example, if hostB is your new master, run these commands on it:</p>
323<p>Verify that the cluster IP is now on this host:</p>
324<pre><code># ifconfig br-lan:0</code></pre>
325<p>Notice that the IP address in br-lan:0 is that of the cluster master.</p>
326<p>This means that next time you log on using SSH using the cluster IP, you will be logged on to hostB.</p>
327<p>Check which node is the master (remember, you need to run this on the master).</p>
328<pre><code># gnt-cluster getmaster
329hostB.ws.nsrc.org</code></pre>
330<p>All good!</p>
331<h1 id="scenario-loss-of-master-node"><a href="#scenario-loss-of-master-node"><span class="header-section-number">5</span> Scenario: Loss of Master Node</a></h1>
332<p>Let's imagine a slightly more critical scenario: the crash of the master node.</p>
333<p>Let's shut down the master node!</p>
334<p>On hostB (it's now our master node, remember ?)</p>
335<pre><code># halt -p</code></pre>
336<p>The node is now down. VM still running on other nodes are unaffected, but you are not able to make any changes (stop, start, modify, add VMs, change cluster configuration, etc...)</p>
337<h2 id="promoting-slave"><a href="#promoting-slave"><span class="header-section-number">5.1</span> Promoting slave</a></h2>
338<p>Let's assume that hostB is not coming back right now, and we need to promote a master.</p>
339<p>You will first need to decide which of the remaining nodes will become the master. Let's pick hostA.</p>
340<p>To promote the slave:</p>
341<ul>
342<li><p>Log on to the node that will become master (hostA):</p></li>
343<li><p>Run the following command:</p></li>
344</ul>
345<pre><code># gnt-cluster master-failover</code></pre>
346<p>Note here that you will NOT be asked to confirm the operation!</p>
347<p>If you have 3 or more nodes in the cluster, the operation should be as smooth as in the previous section.</p>
348<p>On the other hand, if you only had 2 nodes in your cluster, you would have to specify <code>--no-voting</code> as an option. This is because, if one node is down, there is only one node left in the cluster, and no election can take place.</p>
349<p>At this point, the chosen node (hostA) is now master. You can verify this using the <code>gnt-cluster getmaster</code> command.</p>
350<p>From this point, recovering downed machines is similar to what we did in the first scenario. But to be on the safe side:</p>
351<ul>
352<li><p>Restart hostB, and log in to it as root</p></li>
353<li><p>Try and run <code>gnt-instance list</code></p></li>
354</ul>
355<p>XXX error here - split brain</p>
356<p>Normally, even though hostA was down while the promotion of hostB happened, the <code>ganeti-masterd</code> daemon running on hostA was informed, on startup, that hostA was no longer master. The above command should therefore fail with:</p>
357<pre><code>This is not the master node, please connect to node 'hostB.ws.nsrc.org' and
358rerun the command</code></pre>
359<p>Which means that hostA is well aware that hostB is the master now.</p>
360<p>Once you have done this, you may find that hostB and hostA have different versions of the cluster database. Type the following on hostB:</p>
361<pre><code># gnt-cluster verify
362...
363Sat Jan 18 16:11:12 2014   - ERROR: cluster: File /var/lib/ganeti/config.data found with 2 different checksums (variant 1 on hostB.ws.nsrc.org, hostC.ws.nsrc.org; variant 2 on hostA.ws.nsrc.org)
364Sat Jan 18 16:11:12 2014   - ERROR: cluster: File /var/lib/ganeti/ssconf_master_node found with 2 different checksums (variant 1 on hostB.ws.nsrc.org, hostC.ws.nsrc.org; variant 2 on hostA.ws.nsrc.org)</code></pre>
365<p>You can fix this by:</p>
366<pre><code># gnt-cluster redist-conf</code></pre>
367<p>which pushes out the config from the current master to all the other nodes.</p>
368<p>Re-run <code>gnt-cluster verify</code> to check everything is OK again.</p>
369<p>Then to make hostA take over the master role again, login to hostA and run:</p>
370<pre><code># gnt-cluster master-failover</code></pre>
371</body>
372</html>