The EXPRESSCLUSTER X Hardware Feature Guide is intended for system administrators. Detailed information of the functions for linking with specific hardware is covered in this guide.
This guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.
This guide is intended for system administrators. The guide covers topics such as how to operate EXPRESSCLUSTER, function of each module,and troubleshooting. The guide is complement to the Installation and Configuration Guide.
This guide is intended for administrators and for system administrators who want to build, operate, and maintain EXPRESSCLUSTER-based cluster systems. The guide describes maintenance-related topics for EXPRESSCLUSTER.
EXPRESSCLUSTER X Hardware Feature Guide
This guide is intended for administrators and for system engineers who want to build EXPRESSCLUSTER-based cluster systems. The guide describes features to work with specific hardware, serving as a supplement to the Installation and Configuration Guide.
2.1. Overview of the server management infrastructure
This server management infrastructure is included in Enterprise Linux with Dependable Support. This software provides the following functions:
Recording information about failures detected by the expanded device driver
Linking with EXPRESSCLUSTER X to perform a failover when the expanded device driver detects a fatal system failure
For details, see the manual for Enterprise Linux with Dependable Support.
2.2. Overview of linkage between the server management infrastructure and EXPRESSCLUSTER
EXPRESSCLUSTER's function for linking with the server management infrastructure is not used for EXPRESSCLUSTER to perform monitoring itself. This linkage function is used for EXPRESSCLUSTER to receive messages spontaneously sent by the driver module and passively perform a failover or other processing.
The following shows an overview:
Fig. 2.1 Overview of the linkage between server management infrastructure and EXPRESSCLUSTER
If a fatal system error occurs, the expanded device driver included in Enterprise Linux with Dependable Support (hereafter referred to as the expanded driver) sends a message to EXPRESSCLUSTER through the server management infrastructure. After receiving such a message, EXPRESSCLUSTER performs the following operations.
EXPRESSCLUSTER makes the status of the corresponding external link monitor (mrw) abnormal. The administrator can visually determine that an error was detected by checking the status using the Cluster WebUI or an EXPRESSCLUSTER command.
When a failure occurs, EXPRESSCLUSTER performs an operation failover or shuts down the OS according to the specified action.
2.3. Setup of the function to link with the server management infrastructure
For details about resources other than the external link monitor resource, see the EXPRESSCLUSTER manuals below.
To use the function for linking with the server management infrastructure, the external link monitor resources must be registered with the cluster . To create configuration information, register the necessary external link monitor resources as described in the manual. For the external link monitor resources, see "External link monitor resources".
Uploading EXPRESSCLUSTER configuration information
The external link monitor resources monitor error messages reported from outside. This section only covers the part associated with linkage with the server management infrastructure. For other cases, see "Monitor resource details" in the Reference Guide.
An external link monitor resource cannot execute any scrip before the final action if it is linked with the server management infrastructure.
Do not use the clprexec command, because EXPRESSCLUSTER manages the status of an external link monitor resource if it is linked with the server management infrastructure.
When the keyword by an external link monitor resource is specified, if an error is detected in the device specified as the monitor target, an error occurs and the error correction action is performed .
If no device is specified as the keyword by an external link monitor resource and an error is detected in any device that matches the Category, an error occurs and the error correction action is performed.
2.4.2. Category by an external link monitor resource
An external link monitor resource receives the following message types when it is linked with the server management infrastructure.
NIC
Monitors the error messages of network interface cards.
FC
Monitors the error messages of Fibre Channel.
HA/SS
Monitors the error messages of the EXPRESSCLUSTER X HA/StorageSaver.
HA/AM
Monitors the error messages of the EXPRESSCLUSTER X HA/ApplicationMonitor.
HA/RS
Monitors the error messages of the EXPRESSCLUSTER X HA/ResourceSaver.
Specify the recovery target and the action upon detecting an error. For external link monitor resources, select "Restart the recovery target", " Executing failover to the recovery target", or " Execute the final action" as the action to take when an error is detected. However, if the recovery target is inactive, the recovery action is not performed.
Recovery Action
Select the action to take when a monitor error is detected.
Executing recovery script
Let the recovery script run upon the detection of a monitor error.
Restart the recovery target
Restart the group or group resource selected as the recovery target when a monitor error is detected.
Executing failover to the recovery target
Perform a failover for the group selected as the recovery target or the group to which the group resource selected as the recovery target belongs when a monitor error is detected.
Execute the final action
Execute the selected final action when a monitor error is detected.
Execute Failover to outside the Server Group
Configurable only for external link monitor resources. Specify whether to fail over to a server group other than the active server group upon the reception of an error message.