1. Preface

1.1. Who Should Use This Guide

The EXPRESSCLUSTER X Hardware Feature Guide is intended for system administrators. Detailed information of the functions for linking with specific hardware is covered in this guide.

The guide provides supplemental information to the EXPRESSCLUSTER X Installation and Configuration Guide.

For information on the construction and the operation of clusters, refer to the guide.

1.2. How This Guide is Organized

1.3. EXPRESSCLUSTER X Documentation Set

The EXPRESSCLUSTER manuals consist of the following five guides. The title and purpose of each guide is described below.

EXPRESSCLUSTER X Getting Started Guide

This guide is intended for all users. The guide covers topics such as product overview, system requirements, and known problems.

EXPRESSCLUSTER X Installation and Configuration Guide

This guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.

EXPRESSCLUSTER X Reference Guide

This guide is intended for system administrators. The guide covers topics such as how to operate EXPRESSCLUSTER, function of each module,and troubleshooting. The guide is complement to the Installation and Configuration Guide.

EXPRESSCLUSTER X Maintenance Guide

This guide is intended for administrators and for system administrators who want to build, operate, and maintain EXPRESSCLUSTER-based cluster systems. The guide describes maintenance-related topics for EXPRESSCLUSTER.

EXPRESSCLUSTER X Hardware Feature Guide

This guide is intended for administrators and for system engineers who want to build EXPRESSCLUSTER-based cluster systems. The guide describes features to work with specific hardware, serving as a supplement to the Installation and Configuration Guide.

1.4. Conventions

In this guide, Note, Important, See also are used as follows:

Note

Used when the information given is important, but not related to the data loss and damage to the system and machine.

Important

Used when the information given is necessary to avoid the data loss and damage to the system and machine.

See also

Used to describe the location of the information given at the reference destination.

The following conventions are used in this guide.

Convention

Usage

Example

Bold
Indicates graphical objects, such as fields, list boxes, menu selections, buttons, labels, icons, etc.
In User Name, type your name.
On the File menu, click Open Database.

Angled bracket within the command line

Indicates that the value specified inside of the angled bracket can be omitted.

clpstat -s[-h host_name]

#

Prompt to indicate that a Linux user has logged on as root user.

# clpcl -s -a

Monospace

Indicates path names, commands, system output (message, prompt, etc.), directory, file names, functions and parameters.

/Linux/5.0/en/server/

bold
Indicates the value that a user actually enters from a command line.
Enter the following:
# clpcl -s -a
italic

Indicates that users should replace italicized part with values that they are actually working with.

clpstat -s [-h host_name]

EXPRESSCLUSTER X In the figures of this guide, this icon represents EXPRESSCLUSTER.

1.5. Contacting NEC

For the latest product information, visit our website below:

https://www.nec.com/global/prod/expresscluster/

2. Linkage with Server Management Infrastructure

This chapter provides an overview of the server management infrastructure included in Enterprise Linux with Dependable Support.

This chapter covers:

2.1. Overview of the server management infrastructure

This server management infrastructure is included in Enterprise Linux with Dependable Support. This software provides the following functions:

  • Recording information about failures detected by the expanded device driver

  • Linking with EXPRESSCLUSTER X to perform a failover when the expanded device driver detects a fatal system failure

For details, see the manual for Enterprise Linux with Dependable Support.

2.2. Overview of linkage between the server management infrastructure and EXPRESSCLUSTER

EXPRESSCLUSTER's function for linking with the server management infrastructure is not used for EXPRESSCLUSTER to perform monitoring itself. This linkage function is used for EXPRESSCLUSTER to receive messages spontaneously sent by the driver module and passively perform a failover or other processing.

The following shows an overview:

エラーの発生したServer1と、正常なServer2、および Management PC

Fig. 2.1 Overview of the linkage between server management infrastructure and EXPRESSCLUSTER

If a fatal system error occurs, the expanded device driver included in Enterprise Linux with Dependable Support (hereafter referred to as the expanded driver) sends a message to EXPRESSCLUSTER through the server management infrastructure. After receiving such a message, EXPRESSCLUSTER performs the following operations.

  • EXPRESSCLUSTER makes the status of the corresponding message receive monitor (mrw) abnormal. The administrator can visually determine that an error was detected by checking the status using the Cluster WebUI or an EXPRESSCLUSTER command.

  • When a failure occurs, EXPRESSCLUSTER performs an operation failover or shuts down the OS according to the specified action.

2.4. Message receive monitor resources

The message receive monitor resources monitor error messages reported from outside. This section only covers the part associated with linkage with the server management infrastructure. For other cases, see "Monitor resource details" in the Reference Guide.

2.4.1. Notes on message receive monitor resources

A message receive monitor resource cannot execute any scrip before the final action if it is linked with the server management infrastructure.
Do not use the clprexec command, because EXPRESSCLUSTER manages the status of a message receive monitor resource if it is linked with the server management infrastructure.
When the keyword by a message receive monitor resource is specified, if an error is detected in the device specified as the monitor target, an error occurs and the error correction action is performed .
If no device is specified as the keyword by a message receive monitor resource and an error is detected in any device that matches the Category, an error occurs and the error correction action is performed.

2.4.2. Category by a message receive monitor resource

A message receive monitor resource receives the following message types when it is linked with the server management infrastructure.

  1. NIC
    Monitors the error messages of network interface cards.
  2. FC
    Monitors the error messages of Fibre Channel.
  3. HA/SS
    Monitors the error messages of the EXPRESSCLUSTER X HA/StorageSaver.
  4. HA/AM
    Monitors the error messages of the EXPRESSCLUSTER X HA/ApplicationMonitor.
  5. HA/RS
    Monitors the error messages of the EXPRESSCLUSTER X HA/ResourceSaver.
  6. SPS
    Monitors the error messages of the SPS.

2.4.3. Monitor(special) tab

For more information on the Info tab and the Monitor (common) tab, see "Monitor resource details" in the Reference Guide.

Category (within 32 bytes)

Specify a category.
Be sure to select a default character string from the list box.

Keyword (within 1023 bytes)

Specify a monitor target.

2.4.4. Recovery Action tab

For more information on the Info tab and the Monitor (common) tab, see "Monitor resource details" in the Reference Guide.

Specify the recovery target and the action upon detecting an error. For message receive monitor resources, select "Restart the recovery target", " Executing failover to the recovery target", or " Execute the final action" as the action to take when an error is detected. However, if the recovery target is inactive, the recovery action is not performed.

Recovery Action

Select the action to take when a monitor error is detected.

  • Executing recovery script
    Let the recovery script run upon the detection of a monitor error.
  • Restart the recovery target
    Restart the group or group resource selected as the recovery target when a monitor error is detected.
  • Executing failover to the recovery target
    Perform a failover for the group selected as the recovery target or the group to which the group resource selected as the recovery target belongs when a monitor error is detected.
  • Execute the final action
    Execute the selected final action when a monitor error is detected.

Execute Failover to outside the Server Group

Configurable only for message receive monitor resources. Specify whether to fail over to a server group other than the active server group upon the reception of an error message.

Execute Script before Recovery Action

This setting is disabled when linking with the server management infrastructure. Execute the script before the operation to be performed upon error detection, as selected for the recovery action.

For details about the settings other than the above, see "Recovery Action tab" in "Monitor resource properties" in "Monitor resource details" in the Reference Guide.