STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access.

For example if a node network interface is down, but it mounts the filesystem, thus, you can't just simply sart mouthe the filesystem on other nodes. Using STONITH, you can make sure the node is surely offline and safely let other node access the data.

STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.

In the following examples, I'll create 3 IBM RSA STONITH agents for nodeA, nodeB, and nodeC. So that each node has a fencing device for other nodes to bring it down when needed.

Available STONITH (Fencing) Agents

# pcs stonith list
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
...
fence_rsa - Fence agent for IBM RSA
...

You can also add a filter to the end of the command, for example:

# pcs stonith list rsa
fence_rsa - Fence agent for IBM RSA

In the following examples, all fencing devices will all use fence_rsa

Setup properties for STONITH

# pcs property set no-quorum-policy=ignore
# pcs property set stonith-enabled=true
# pcs property set stonith-action=poweroff # default is reboot

Note: Set the stonith action to off is not always good option, for this example case, the resource is filesystem. and the filesystem device has redundancy access. If the resource access related fault caused the node get fenced, we better leave the node off for further investigation, instead of rebooting it to fix the problem.

Creating a Fencing Device

#pcs stonith create stonith-rsa-nodeA fence_rsa action=off ipaddr="nodeA_rsa" login=<user> passwd=<pass> pcmk_host_list=nodeA secure=true
# pcs stonith show
 stonith-rsa-nodeA    (stonith:fence_rsa):    Stopped

Displaying Fencing Devices

We repeat the same steps for nodeB and nodeC, then we have 3 fence devices. The stonith service will start itself.
# pcs stonith show
 stonith-rsa-nodeA    (stonith:fence_rsa):    Started
 stonith-rsa-nodeB    (stonith:fence_rsa):    Started
 stonith-rsa-nodeC    (stonith:fence_rsa):    Started

Managing Nodes with Fence Devices

# pcs stonith fence nodeC    
Node: nodeC fenced
# pcs stonith confirm nodeC
Node: nodeC confirmed fenced

By default, the fence action bring the node off then on. If you want to bring the node offline only, use option --off
Note: confirm command also can be directly used to fence a node off.

Modifying Fencing Devices

You may noticed that there are many options were used during the fence device creation. Actuall, all of them can be modified and updated.

pcs stonith update stonith_id [stonith_device_options]

Displaying Device-Specific Fencing Options

In case you want to know the fence agent options, you can use the following ways.

You can also find it's options by its command line mode:

# /usr/sbin/fence_rsa -h

More detail about how to check, debug fence device in command line, see Fence agent for ibm rsa.

Or, you can list the fence agent options by pcs

# pcs stonith describe fence_rsa 
Stonith options for: fence_rsa
  action (required): Fencing Action
  ipaddr (required): IP Address or Hostname
  login (required): Login Name
  passwd: Login password or passphrase
  passwd_script: Script to retrieve password
  cmd_prompt: Force command prompt
  secure: SSH connection
  identity_file: Identity file for ssh
  ipport: TCP port to use for connection with device
  ssh_options: SSH options to use
  verbose: Verbose mode
  debug: Write debug information to given file
  version: Display version information and exit
  help: Display help and exit
  power_timeout: Test X seconds for status change after ON/OFF
  shell_timeout: Wait X seconds for cmd prompt after issuing command
  login_timeout: Wait X seconds for cmd prompt after login
  power_wait: Wait X seconds after issuing ON/OFF
  delay: Wait X seconds before fencing is started
  retry_on: Count of attempts to retry power on
  stonith-timeout: How long to wait for the STONITH action to complete per a stonith device.
  priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.
  pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names.
  pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
  pcmk_host_check: How to determine which machines are controlled by the device.

Deleting Fencing Devices

pcs stonith delete stonith_id

Configuring Fencing Levels

pcs stonith level add level node devices

Additional Fencing Configuration Options

FieldTypeDefaultDescription
pcmk_host_argument string port An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters.
pcmk_reboot_action string reboot An alternate command to run instead of reboot. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the reboot action.
pcmk_reboot_timeout time 60s Specify an alternate timeout to use for reboot actions instead of stonith-timeout. Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for reboot actions.
pcmk_reboot_retries integer 2 The maximum number of times to retry the reboot command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries reboot actions before giving up.
pcmk_off_action string off An alternate command to run instead of off. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the off action.
pcmk_off_timeout time 60s Specify an alternate timeout to use for off actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for off actions.
pcmk_off_retries integer 2 The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up.
pcmk_list_action string list An alternate command to run instead of list. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the list action.
pcmk_list_timeout time 60s Specify an alternate timeout to use for list actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions.
pcmk_list_retries integer 2 The maximum number of times to retry the list command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries list actions before giving up.
pcmk_monitor_action string monitor An alternate command to run instead of monitor. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the monitor action.
pcmk_monitor_timeout time 60s Specify an alternate timeout to use for monitor actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for monitor actions.
pcmk_monitor_retries integer 2 The maximum number of times to retry the monitor command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries monitor actions before giving up.
pcmk_status_action string status An alternate command to run instead of status. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the status action.
pcmk_status_timeout time 60s Specify an alternate timeout to use for status actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for status actions.
pcmk_status_retries integer 2 The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up.