Ganeti Node OOB Management Framework

Created:

2010-Nov-04

Status:

Implemented

Ganeti-Version:

2.4.0

Objective

Extend Ganeti with Out of Band (OOB) Cluster Node Management Capabilities.

Background

Ganeti currently has no support for Out of Band management of the nodes in a cluster. It relies on the OS running on the nodes and has therefore limited possibilities when the OS is not responding. The command gnt-node powercycle can be issued to attempt a reboot of a node that crashed but there are no means to power a node off and power it back on. Supporting this is very handy in the following situations:

  • Emergency Power Off: During emergencies, time is critical and manual tasks just add latency which can be avoided through automation. If a server room overheats, halting the OS on the nodes is not enough. The nodes need to be powered off cleanly to prevent damage to equipment.

  • Repairs: In most cases, repairing a node means that the node has to be powered off.

  • Crashes: Software bugs may crash a node. Having an OS independent way to power-cycle a node helps to recover the node without human intervention.

Overview

Ganeti will be extended with OOB capabilities through adding a new Cluster Parameter (--oob-program), a new Node Property (--oob-program), a new Node State (powered) and support in gnt-node for invoking an External Helper Command which executes the actual OOB command (gnt-node <command> nodename ...). The supported commands are: power on, power off, power cycle, power status and health.

Note

The new Node State (powered) is a State of Record (SoR), not a State of World (SoW). The maximum execution time of the External Helper Command will be limited to 60s to prevent the cluster from getting locked for an undefined amount of time.

Detailed Design

New gnt-cluster Parameter

Program: gnt-cluster
Command: modify|init
Parameters: --oob-program
Options: --oob-program: executable OOB program (absolute path)

New gnt-cluster epo Command

Program: gnt-cluster
Command: epo
Parameter: --on --force --groups --all
Options: --on: By default epo turns off, with --on it tries to get the
cluster back online
--force: To force the operation without asking for confirmation
--groups: To operate on groups instead of nodes
--all: To operate on the whole cluster

This is a convenience command to allow easy emergency power off of a whole cluster or part of it. It takes care of all steps needed to get the cluster into a sane state to turn off the nodes.

With --on it does the reverse and tries to bring the rest of the cluster back to life.

Note

The master node is not able to shut itself cleanly down. Therefore, this command will not do all the work on single node clusters. On multi node clusters the command tries to find another master or if that is not possible prepares everything to the point where the user has to shutdown the master node itself alone this applies also to the single node cluster configuration.

New gnt-node Property

Program: gnt-node
Command: modify|add
Parameters: --oob-program
Options: --oob-program: executable OOB program (absolute path)

Note

If --oob-program is set to ! then the node has no OOB capabilities. Otherwise, we will inherit the node group respectively the cluster wide value. I.e. the nodes have to opt out from OOB capabilities.

Addition to gnt-cluster verify

Program: gnt-cluster
Command: verify
Parameter: None
Option: None
Additional Checks:
  1. existence and execution flag of OOB program on all Master Candidates if the cluster parameter --oob-program is set or at least one node has the property --oob-program set. The OOB helper is just invoked on the master

  2. check if node state powered matches actual power state of the machine for those nodes where --oob-program is set

New Node State

Ganeti supports the following two boolean states related to the nodes:

drained

The cluster still communicates with drained nodes but excludes them from allocation operations

offline

if offline, the cluster does not communicate with offline nodes; useful for nodes that are not reachable in order to avoid delays

And will extend this list with the following boolean state:

powered

if not powered, the cluster does not communicate with not powered nodes if the node property --oob-program is not set, the state powered is not displayed

Additionally modify the meaning of the offline state as follows:

offline

if offline, the cluster does not communicate with offline nodes (with the exception of OOB commands for nodes where --oob-program is set); useful for nodes that are not reachable in order to avoid delays

The corresponding command extensions are:

Program: gnt-node
Command: info
Parameter: [ nodename … ]
Option: None

Additional Output (SoR, ommited if node property --oob-program is not set): powered: [True|False]

Program: gnt-node
Command: modify
Parameter: nodename
Option: [ --powered=yes|no ]
Reasoning: sometimes you will need to sync the SoR with the SoW manually
Caveat: --powered can only be modified if --oob-program is set for
the node in question

New gnt-node commands: power [on|off|cycle|status]

Program: gnt-node
Command: power [on|off|cycle|status]
Parameters: [ nodename … ]
Options: None
Caveats:
  • If no nodenames are passed to power [on|off|cycle], the user will be prompted with "Do you really want to power [on|off|cycle] the following nodes: <display list of OOB capable nodes in the cluster)? (y/n)"

  • For power-status, nodename is optional, if omitted, we list the power-status of all OOB capable nodes in the cluster (SoW)

  • User should be warned and needs to confirm with yes if s/he tries to power [off|cycle] a node with running instances.

Error Handling

Exception

Error Message

OOB program return code != 0

OOB program execution failed ($ERROR_MSG)

OOB program execution time exceeds 60s

OOB program execution timeout exceeded, OOB program execution aborted

Node State Changes

State before execution

Command

State after execution

Comment

powered: False

power off

powered: False

FYI: IPMI will complain if you try to power off a machine that is already powered off

powered: False

power cycle

powered: False

FYI: IPMI will complain if you try to cycle a machine that is already powered off

powered: False

power on

powered: True

powered: True

power off

powered: False

powered: True

power cycle

powered: True

powered: True

power on

powered: True

FYI: IPMI will complain if you try to power on a machine that is already powered on

Note

  • If the command fails, the Node State remains unchanged.

  • We will not prevent the user from trying to power off a node that is already powered off since the powered state represents the SoR only and not the SoW. This can however create problems when the cluster administrator wants to bring the SoR in sync with the :term:SoW` without actually having to mess with the node(s). For this case, we allow direct modification of the powered state through the gnt-node modify --powered=[yes|no] command as long as the node has OOB capabilities (i.e. --oob-program is set).

  • All node power state changes will be logged

Node Power Status Listing (SoW)

Program: gnt-node
Command: power-status
Parameters: [ nodename … ]

Example output (represents SoW):

gnt-node oob power-status
Node                      Power Status
node1.example.com         on
node2.example.com         off
node3.example.com         on
node4.example.com         unknown

Note

  • We use unknown in case the Helper Program could not determine the power state.

  • If no nodenames are provided, we will list the power state of all nodes which are not opted out from OOB management.

  • Only nodes which are not opted out from OOB management will be listed. Invoking the command on a node that does not meet this condition will result in an error message “Node X does not support OOB commands”.

Node Power Status Listing (SoR)

Program: gnt-node
Command: info
Parameter: [ nodename … ]
Option: None

Example output (represents SoR):

gnt-node info node1.example.com
Node name: node1.example.com
  primary ip: 192.168.1.1
  secondary ip: 192.168.2.1
  master candidate: True
  drained: False
  offline: False
  powered: True
  primary for instances:
    - inst1.example.com
    - inst2.example.com
    - inst3.example.com
  secondary for instances:
    - inst4.example.com
    - inst5.example.com
    - inst6.example.com
    - inst7.example.com

Note

Only nodes which are not opted out from OOB management will report the powered state.

New gnt-node oob subcommand: health

Program: gnt-node
Command: health
Parameters: [ nodename … ]
Options: None
Example: /usr/bin/oob health node5.example.com

Caveats:

  • If no nodename(s) are provided, we will report the health of all nodes in the cluster which have --oob-program set.

  • Only nodes which are not opted out from OOB management will report their health. Invoking the command on a node that does not meet this condition will result in an error message “Node does not support OOB commands”.

For error handling see Error Handling

OOB Program (Helper Program) Parameters, Return Codes and Data Format

Program: executable OOB program (absolute path)
Parameters: command nodename
Command: [power-{on|off|cycle|status}|health]
Options: None
Example: /usr/bin/oob power-on node1.example.com
Caveat: maximum runtime is limited to 60s

Return Codes

Return code

Meaning

0

Command succeeded

1

Command failed

others

Unsupported/undefined

Error messages are passed from the helper program to Ganeti through stderr(3) (return code == 1). On stdout(3), the helper program will send data back to Ganeti (return code == 0). The format of the data is JSON.

Command

Expected output

power-on

None

power-off

None

power-cycle

None

power-status

{ "powered": true|false }

health

[[item, status],
 [item, status],
 ...]

Data Format

For the health output, the fields are:

Field

Meaning

item

String identifier of the item we are querying the health of, examples:

  • Ambient Temp

  • PS Redundancy

  • FAN 1 RPM

status

String; Can take one of the following four values:

  • OK

  • WARNING

  • CRITICAL

  • UNKNOWN

Note

  • The item output list is defined by the Helper Program. It is up to the author of the Helper Program to decide which items should be monitored and what each corresponding return status is.

  • Ganeti will currently not take any actions based on the item status. It will however create log entries for items with status WARNING or CRITICAL for each run of the gnt-node oob health nodename command. Automatic actions (regular monitoring of the item status) is considered a new service and will be treated in a separate design document.

Logging

The gnt-node power-[on|off] (power state changes) commands will create log entries following current Ganeti logging practices. In addition, health items with status WARNING or CRITICAL will be logged for each run of gnt-node health.