The Installation and Configuration Guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.
This guide is intended for all users. The guide covers topics such as product overview, system requirements, and known problems.
Installation and Configuration Guide
This guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.
This guide is intended for system administrators. The guide covers topics such as how to operate EXPRESSCLUSTER, function of each module and troubleshooting. The guide is supplement to the Installation and Configuration Guide.
This guide is intended for administrators and for system administrators who want to build, operate, and maintain EXPRESSCLUSTER-based cluster systems. The guide describes maintenance-related topics for EXPRESSCLUSTER.
2.1. Steps from configuring a cluster system to installing EXPRESSCLUSTER
Before you set up a cluster system that uses EXPRESSCLUSTER, you should carefully plan the cluster system with due consideration for factors such as hardware requirements, software to be used, and the way the system is used. When you have built the cluster, check to see if the cluster system is successfully set up before you start its operation.
This guide explains how to create a cluster system with EXPRESSCLUSTER through step-by-step instructions. Read each chapter by actually executing the procedures to install the cluster system. The following is the steps you take from designing the cluster system to operating EXPRESSCLUSTER:
See also
Refer to the "Reference Guide" as you need when operating EXPRESSCLUSTER by following the procedures introduced in this guide. See the "Getting Started Guide" for the latest information including system requirements and lease information.
Before installing EXPRESSCLUSTER, create the hardware configuration, the cluster system configuration and the information on the cluster system configuration.
Based on the failover group information determined in the step 2, create the cluster configuration data by using the Cluster WebUI, and then configure a cluster.
Check if the cluster system has been created successfully.
Conduct a dummy test, parameter tuning and operational simulation required to be done before operating the cluster system. The procedures to uninstall and reinstall are also explained in this section.
EXPRESSCLUSTER is software that enhances availability and expandability of systems by a redundant (clustered) system configuration. The application services running on the active server are automatically inherited to the standby server when an error occurs on the active server.
The following can be achieved by installing a cluster system that uses EXPRESSCLUSTER.
High availability
The down time is minimized by automatically failing over the applications and services to a "healthy" server when one of the servers which configure a cluster stops.
High expandability
Both Windows and Linux support large scale cluster configurations having up to 32 servers.
EXPRESSCLUSTER X consists of following two modules:
EXPRESSCLUSTER Server
The main module of EXPRESSCLUSTER and has all high availability functions of the server. Install this module on each server constituting the cluster.
Cluster WebUI
This is a tool to create the configuration data of EXPRESSCLUSTER and to manage EXPRESSCLUSTER operations. The Cluster WebUI is installed in EXPRESSCLUSTER Server, but it is distinguished from the EXPRESSCLUSTER Server because the Cluster WebUI is operated through a Web browser on the management PC.
You need to determine an appropriate hardware configuration to install a cluster system that uses EXPRESSCLUSTER. The configuration examples of EXPRESSCLUSTER are shown below.
There are three types of system configurations: shared disk type, mirror disk type and hybrid disk type.
Shared disk type
When the shared disk type configuration is used, application data is stored on a shared disk that is physically connected to servers, by which access to the same data after failover is ensured.
You can make settings that block the rest of the server from accessing the shared disk when one server is using a specific space of the shared disk.
The shared disk type is used in a system such as a database server where a large volume of data is written because performance in writing data does decrease.
Mirror disk type
When the mirror disk type configuration is used, application data is mirrored between disks of two servers, by which access to the same data after failover is ensured.
When data is written on the active server, the data also needs to be written on the standby server. As a result, the writing performance will decrease.
However, the cost of the system can be reduced because no external disk such as a shared disk is necessary, and the cluster can be achieved only by disks on servers.
When configuring a remote cluster by placing the standby server in a remote site for disaster control, a shared disk cannot be used. Thus the mirror disk type is used.
Hybrid type
This configuration is a combination of the shared disk type and the mirror disk type. By mirroring the data on the shared disk, the data is placed in the third server, which prevents the shared disk being a single point of failure.
Data writing performance, operational topology and precautions of the mirror disk type apply to the hybrid type.
The following show configuration examples of the shared disk type, the mirror disk type and the hybrid type. Use these examples to design and set up your system.
2.3.2. Example 1: Configuration using a shared disk with 2 nodes
This is the most commonly used system configuration:
Different models can be used for servers. However, mirroring disks should have the same drive letter in both servers.
Use cables for interconnection. A dedicated HUB can be used for connection the same way as 3-nodes configuration.
Client 1, which exists on the same LAN as that of the cluster servers, can access them through a floating IP address.
Client 2, which exists on a remote LAN, can also access the cluster servers through a floating IP address.
Using floating IP addresses does not require the router to be configured for them.
Fig. 2.4 Example of a configuration using a shared disk with two nodes
2.3.3. Example 2: Configuration using mirror disks with 2 nodes
Different models can be used for servers. However, the mirrors disk should have the same drive letter on both servers.
It is recommended to use cables for interconnection. (It is recommended to connect one server to another server directly using a cable. A HUB can also be used.)
On cluster servers (Servers 1 and 2), the same drive letter needs to be specified.
For this configuration, different models can be used.
However, their partitions for mirroring must be set at exactly the same size in bytes. This may be impossible if there is a difference in the disk geometry.
For connecting the interconnect cable, direct connection between the servers is recommended, but connection via a hub is also fine.
Client 1, which exists on the same LAN as that of the cluster servers, can access them through a floating IP address.
Client 2, which exists on a remote LAN, can also access the cluster servers through a floating IP address.
Using floating IP addresses does not require the router to be configured for them.
Fig. 2.5 Example of a configuration using mirror disks with two nodes
2.3.4. Example 3: Configuration using mirror partitions on the disks for OS with 2 nodes
A mirroring partition can be created on the disk used for the OS.
On Servers 1 and 2, the same drive letter needs to be specified.
For this configuration, different models can be used.
However, their partitions for mirroring must be set at exactly the same size in bytes. This may be impossible if there is a difference in the disk geometry.
The partition for mirroring can be created on the same disk as that for the OS on each of the servers.
Client 1, which exists on the same LAN as that of the cluster servers, can access them through a floating IP address.
Client 2, which exists on a remote LAN, can also access the cluster servers through a floating IP address.
Using floating IP addresses does not require the router to be configured for them.
Fig. 2.6 Example of a configuration with two nodes, making the mirroring area coexist with the OS area
2.3.5. Example 4: Configuring a remote cluster by using asynchronous mirror disks with 2 nodes
On Servers 1 and 2, the same drive letter needs to be specified.
For this configuration, different models can be used.
However, their partitions for mirroring must be set at exactly the same size in bytes. This may be impossible if there is a difference in the disk geometry.
A client can access the cluster servers through a virtual IP (VIP) address. Using a VIP address requires a router to communicate the RIP host route.
Configuring a cluster between servers in remote sites by using WAN, as shown below, is a solution for disaster control.
Using asynchronous mirror disks can curb a decrease in disk performance due to the network delay. There is still a chance that the information updated immediately before a failover gets lost.
It is necessary to secure enough communication bandwidth for the traffic amount of updated information on mirror disks. Insufficient bandwidth can cause delay of communication with a business operation client or interruption of mirroring.
Use Dynamic DNS resource or Virtual IP resource to switch the connected server.
Fig. 2.7 Example of configuring a remote cluster by using asynchronous mirror disks with two nodes
2.3.6. Example 5: Configuration using a shared disk with 3 nodes
The same way as 2 nodes-configuration, connect servers to a shared disk. The shared disk should have the same drive letter on all servers.
Interconnect LAN cables are connected to the interconnect hub, which is not connected to any other server or client.
Fig. 2.8 Example of a configuration using a shared disk with three nodes
2.3.7. Example 6: Configuration using both mirror disks and a shared disk with 3 nodes
On Servers 1 and 2, the same drive letter needs to be specified.
For this configuration, different models can be used.
However, their partitions for mirroring must be set at exactly the same size in bytes. This may be impossible if there is a difference in the disk geometry.
It is possible to use both mirror disks and a shared disk on one cluster. In this example, the system is configured with three nodes: one for a shared disk type, one for a mirror disk type and, and one for standby.
It is not necessary to connect a shared disk to the server where business applications using the shared disk do not run. However the shared disk needs to have the same drive letter on the all connecting servers.
Install a dedicated HUB for interconnection.
Fig. 2.9 Example of a configuration using both mirror disks and a shared disk with three nodes
2.3.8. Example 7: Configuration using the hybrid type with 3 nodes
This is a configuration with three nodes that consists of two nodes connected to the shared disk and one node having a disk to be mirrored.
The servers should not necessarily be the same model.
Install a dedicated HUB for interconnection and LAN of mirror disk connect.
Use a HUB with faster performance as much as possible.
Fig. 2.10 Example of a configuration of the hybrid type with three nodes
Interconnect LAN cables are connected to the interconnect hub, which is not connected to any other server or client.
2.3.9. Example 8: Configuration using BMC-related functions with 2 nodes
This is an example of 2-node cluster configuration for using the forced stop function of a physical machine.
When using BMC-related functions, connect the interconnect LAN and BMC management LAN via a dedicated HUB.
Use as fast a HUB as is available.
Fig. 2.11 Example of a configuration for using BMC-related functions with two nodes
Interconnect LAN and BMC LAN cables are connected to the hub, which is not connected to any other server or client.
2.4. Checking system requirements for each EXPRESSCLUSTER module
EXPRESSCLUSTER X consists of two modules: EXPRESSCLUSTER Server (main module) and Cluster WebUI. Check configuration and operation requirements of each machine where these modules will be used. For details about the operating environments, see "Installation requirements for EXPRESSCLUSTER" in the "Getting Started Guide".
Determine a hardware configuration considering an application to be duplicated on a cluster system and how a cluster system is configured. Read "3.Configuring a cluster system" before you determine a hardware configuration.
2.6.1. Shared disk settings (Required for shared disk)
Set up the shared disk by following the steps below:
Important
When you continue using the data on the shared disk (in the cases such as reinstalling the server), do not create partitions or a file system. If you create partitions or a file system, data on the shared disks will be deleted.
Note
The partition to be allocated as described below cannot be used by mounting it on an NTFS folder.
Allocate a partition for disk heartbeat.
Allocate a partition on a shared disk to be used by the DISK Network Partition Resolution Resources in EXPRESSCLUSTER. Create a partition on one of the servers in the cluster that uses the shared disk. Create the partition in the same way as you create ordinary partitions through "Disk Management" function of OS and set a drive letter. Configure it as RAW partition without formatting. Perform this operation on one of the servers to which a shared disk is connected. And then set the same drive letter on other servers that also use the same shared disk. Because the partition has been already created, you do not need to create a partition. Set only the drive letter without formatting from the OS disk management.
Note
A disk heartbeat partition should be 17 MB (17,825,792 bytes) or larger. Leave the disk heartbeat partitions as RAW partition without formatting.
Allocate a cluster partition if you are using the hybrid disk type.
Create a partition to be used for controlling the status of hybrid disk on the shared disk to be mirrored with hybrid disk resource. The procedures for making the cluster partition are the same as the ones for a partition of disk heartbeat resources.
Important
A cluster partition should be 1024MiB or larger. Leave the cluster partition as RAW partition without formatting.
Allocate a switchable partition for disk resources or a data partition for hybrid disk resources on the shard disk.
Create a switchable partition for disk resources or a data partition for hybrid disk resources on a shared disk. Create a partition on one of the servers in the cluster that uses the shared disk. Create the partition through "Disk Management" function of OS, set a drive letter, and format NTFS.
Configure the same drive letter on the other server connected to the shared disk. Because the partition has been already created, you do not need to create a partition or format it.
Because the access control for the shared disk starts performing after the setup of cluster has completed, do not start the multiple servers connected to the shared disk until the setup has completed. Otherwise, files or folders stored on the shared disk may be corrupted. Thus, make sure not to start the multiple servers connected to the shared disk at once till the server with EXPRESSCLUSTER installed has been rebooted after a partition for disk resources has been formatted.
Important
Do not start multiple servers connected with the shared disk simultaneously. The data on the shared disk may be corrupted.
2.6.2. Mirror partition settings (Required for mirror disks)
Set up partitions for mirror disk resources by following the steps below. This is required for a local disk (a disk connected to only one of the servers) to be mirrored with the shared disk in the hybrid configuration.
Note
When you cluster a single server and continue using data on the existing partitions, do not re-create the partitions. If you re-create partitions, data on the shared disks will be deleted.
Note
The partition to be allocated as described below cannot be used by mounting it on an NTFS folder.
Allocate cluster partitions.
Create partitions to be used by the mirror disk resources/hybrid disk resources. The partition is used for managing the status of mirror disk resources/hybrid disk resources. Create the partition in every server in the cluster that uses mirror resources. Create partitions by using "Disk Management" function of OS, and leave them as raw partition without formatting. Configure a drive letter for them.
Note
The cluster partition should be 1024MiB or larger. Leave the disk cluster partition as RAW partition without formatting.
Allocate data partitions
Create the data partitions for mirroring by mirror disk resources/hybrid disk resources. For mirror disk resources, create the data partitions on the two servers on which disk mirroring is performed.
Format partitions with NTFS from "Disk Management" function of OS and configure a drive letter.
Note
When partitions (drive) to be mirrored already exist (in the cases such as reinstalling EXPRESSCLUSTER), you do not need to create partitions again. When data that should be mirrored already exist on partitions, if you create partitions again or format partitions, the data will be deleted.
A drive with a system drive and/or page file and a drive where EXPRESSCLUSTER is installed cannot be used as partitions for mirror disk resources. The data partitions in both servers must be precisely the same size in byte. If the geometries of the servers differ among the servers, it might not be able to create precisely same size of partitions. Check the partition sizes with the clpvolsz command and adjust them. The same drive letter must be configured on the partitions in the servers.
2.6.3. Adjustment of time for EXPRESSCLUSTER services to start up (Required)
Configure the time from turning on each server of the cluster system to starting up the EXPRESSCLUSTER services, longer than the following two:
The time from power-on of the shared disk to the point they become available.
Heartbeat timeout time (30 seconds by default.)
Adjustment of the startup time is necessary to prevent the following problems:
The cluster system is started by powering on the shared disk and servers, but starting the shared disk is not completed before EXPRESSCLUSTER is started up (i.e., the startup of EXPRESSCLUSTER is completed without the shared disk recognized). This leads to a failure in the activation of disk resources.
If a server reboots (i.e., the EXPRESSCLUSTER services start) within the heartbeat timeout time, the other server assumes that the heartbeat continues. This results in a failure in a failover by the server restart.
Therefore, after measuring the above two time periods, adjust the startup time in either of the following ways:
It is recommended to regularly synchronize the clocks of all the servers in the cluster. Make the settings that synchronize server clocks through protocol such as ntp on a daily basis.
Note
When the time of each server is not synchronized, the system time on the server from a client's point of view may change at a failover or group moving, which can lead to a failure of the operation of the application used in this system. The times of logs become different between servers, resulting in delay of failure analysis at occurrence of error.
Note
If the date or time setting on the OS is changed while a System monitor resource or a Process resource monitor resource is operating, the System monitor resource or the Process resource monitor resource may not operate normally.
In EXPRESSCLUSTER, the power saving function (for example, standby or hibernation) cannot be used. Make sure to turn off the power saving function.
2.6.8. Setup of BMC and ipmiutil (Required for using the forced stop function of a physical machine)
For using the forced stop function of a physical machine, configure the Baseboard Management Controller (BMC) of the servers to enable the communication between IP addresses of LAN ports for managing BMC and IP addresses used by the OS. These functions are not available when BMC is not installed on the server or when the network for managing BMC is disabled. For information on how to configure the BMC, refer to the manuals of your server.
These functions are used to control the BMC firmware in the servers by using IPMI Management Utilities (ipmiutil) provided as open source by the BSD license. ipmiutil must be installed on the servers to use these functions.
As of January 2018, ipmiutil can be obtained from the Website below.
EXPRESSCLUSTER uses the hwreset command or ireset command, and alarms command or ialarms command of ipmiutil. To execute these commands without specifying path, include the path of the ipmiutil execution file in the system environment variable PATH or copy the execution file to the folder including the variable in its path (for example, the bin folder in the folder where EXPRESSCLUSTER is installed).
Because EXPRESSCLUSTER does not use the function that requires the IPMI driver, it is not necessary to install the IPMI driver.
To control BMC via LAN by the above commands, an IPMI account with Administrator privilege in BMC in each server. When you use NEC Express5800/100 series server, use User IDs 4 or later to add or change the account, because User IDs 3 or earlier are reserved by other tools. Use tools complying with the IPMI standards such as IPMITool for checking and changing account configuration.
2.6.9. Setup of rsh or ssh (required for using the network warning light feature)
To use the network warning light feature, set up either of the commands supported by the warning light vendor: the ssh command or a command equivalent to rsh.
This chapter provides information required to configure a cluster including requirements of applications to be duplicated, cluster topology, and explanation on resources constituting a cluster.
EXPRESSCLUSTER supports multiple cluster topologies. There are uni-directional standby cluster system that considers one server as an active server and other as standby server, and multi-directional standby cluster system in which both servers act as active and standby servers for different operations.
Uni-directional standby cluster system
In this operation, only one application runs on an entire cluster system. There is no performance deterioration even when a failover occurs. However, resources in a standby server will be wasted.
Multi-directional standby cluster system with the same application
In this operation, the same application runs on more than one server simultaneously in a cluster system. Applications used in this system must support multi-directional standby operations.
Fig. 3.3 Multi-directional standby cluster system with the same application
Multi-directional standby cluster system with different applications
In this operation, different applications run on different servers and standby each other. Resources will not be wasted during normal operation; however, two applications run on one server after failing over and system performance deteriorates.
Fig. 3.4 Multi-directional standby cluster system with different applications
3.2.1. Failover in uni-directional standby cluster
On a uni-directional standby cluster system, the number of groups for an operation service is limited to one as described in the diagrams below:
1. Server 1 runs Application A.
Application A can be run on only one server in the same cluster.
Fig. 3.9 Uni-directional standby cluster with mirror disks (1): in normal operation
Server 1 crashes due to some error.
Fig. 3.10 Uni-directional standby cluster with mirror disks (2): when the server crashes
The application is failed over from Server 1 to Server 2.
Fig. 3.11 Uni-directional standby cluster with mirror disks (3): during a failover
To resume the application, data is recovered from Server 2's mirror disk.
Fig. 3.12 Uni-directional standby cluster with mirror disks (4): during data recovery
After Server 1 is restored, a group transfer can be made for Application A to be returned from Server 2 to Server 1.
Fig. 3.13 Uni-directional standby cluster with mirror disks (5): After the server is restored
3.2.2. Failover in multi-directional standby cluster
On a multi-directional standby cluster system, different applications run on servers. If a failover occurs on the one server, multiple applications start to run on the other server. As a result, the failover destination server is more loaded than the time of normal operation and performance decreases.
When you determine applications to be duplicated, study candidate applications taking what is described below into account to see whether or not they should be clustered in your EXPRESSCLUSTER cluster system.
If an application was updating a file when an error has occurred, the file update may not be completed when the standby server accesses to that file after the failover.
The same problem can happen on a non-clustered server (single server) if it goes down and then is rebooted. In principle, applications should be ready to handle this kind of errors. A cluster system should allow recovery from this kind of errors without human interventions (from a script).
When EXPRESSCLUSTER stops or transfers (performs online failback of) a group for application, it unmounts the file system used by the application group. Therefore, you have to issue an exit command for applications so that all files on the shared disk or mirror disk are stopped.
Typically, you give an exit command to applications in their stop scripts; however, you have to pay attention if an exit command completes asynchronously with termination of the application.
EXPRESSCLUSTER can pass the following types of data between servers:
Data in the switchable partition on the disk resource, or data in the data partition on the mirror disk resource/hybrid disk resource.
The value of a registry key synchronized by a registry synchronous resource
Application data should be divided into the data to be shared among servers and the data specific to the server, and these two types of data should be saved separately.
Data type
Example
Where to store
Data to be shared among servers
User data, etc.
Switching partition of the disk resource or data partition of the mirror disk resource/hybrid disk resource
Data specific to a server
Programs, configuration data
On server's local disks
3.3.1.4. Note 4: Multiple application service groups
When you run the same application service in the multi-directional standby operation, you have to assume (in case of degeneration due to a failure) that multiple application groups are run by the same application on a server.
Applications should have capabilities to take over the passed resources by one of the following methods described in the diagram below. A single server is responsible for running multiple application groups.
The figures displayed below are the same with an example of a shared disk and/or mirror disk.
Fig. 3.23 Application running normally on each server in a multi-directional standby cluster
Starting up multiple instances
This method invokes a new process.
More than one application should co-exist and run.
3.3.1.5. Note 5: Mutual interference and compatibility with applications
Sometimes mutual interference between applications and EXPRESSCLUSTER functions or the operating system functions required to use EXPRESSCLUSTER functions prevents applications or EXPRESSCLUSTER from working properly.
Access control of a shared disk and mirror disk
Access to switchable partitions managed by a disk resource or the data partitions mirrored by a mirror disk resource/hybrid disk resource is restricted when such resource is inactive. The partitions become not readable and writable. If a shared disk or a mirror disk whose application is inactive (in other words not being accessible from user or application), is accessed, an I/O error occurs.
Generally, you can assume when an application that is started up by EXPRESSCLUSTER is started, the switchable partition or data partition to which it should access is already accessible.
Multi-home environment and transfer of IP addresses
In general, one server has multiple IP addresses in a cluster system. The IP address configuration of n each server changes dynamically because a floating IP address and a virtual address move between servers. If an application used in the system does not support such multi-home environment, the system can malfunction. For example, an attempt to acquire the IP address of the local server may result in acquisition of the LAN address for interconnection, which is different from the address used for communicating with clients. For applications that should be conscious of the IP address on a server, IP address to be used should be specified explicitly.
Access to shared disks or mirror disks from applications
The stopping of application groups is not notified to other applications that coexist with the application. Therefore, if such an application is accessing a switchable partition or data partition used by an application group at the time when the application group stops, disk isolation will fail.
Some applications like those responsible for system monitoring service periodically access all disk partitions. To use such applications in your cluster environment, they need a function that allows you to specify monitoring partitions.
What you need to consider differs depending on which standby cluster system is selected for an application. Following is the notes for each cluster system. The numbers corresponds to the numbers of notes (1 through 5) described below:
Note for uni-directional standby [Active-Standby]: 1, 2, 3, and 5
Note for multi-directional standby [Active-Active]: 1, 2, 3, 4, and 5
Note for co-existing behaviors: 5
(Applications co-exist and run. The cluster system does not fail over the applications.)
3.3.3. Solutions to the problems relevant to the notes
Problems
Solution
Note to refer
When an error occurs while updating a data file, the application does not work properly on the standby server.
Modify the program, or add/modify script source to run a process to recover being updated during failover.
Note 1: Data recovery after an error
The application keeps accessing shared disk or mirror disk for a certain period of time even after it is stopped.
Execute the sleep command during stop script execution.
Note 2: Application termination
The same application cannot be started more than once on one server.
In multi-directional operation, reboot the application at failover and pass the shared data.
A failover group (hereafter referred to as group) is a set of resources required to perform an independent operation service in a cluster system. Failover takes place by the unit of group. A group has its own group name and the attribute of the group resources.
Resources in each group are handled by the unit of the group. If a failover occurs in group1 that has disk resource1 and Floating IP resource1, a failover of Disk resource1 and a failover of Floating IP1 are concurrent. (Disk resource 1 never fails over alone.) Likewise, a resource is never included in other groups.
For a failover to occur in a cluster system, a group that works as a unit of failover must be created. A group consists of group resources. In order to create an optimal cluster, you must understand what group resources to be added to the group you create, and have a clear vision of your operation.
A kernel mode module uses a LAN to monitor whether or not servers are active.
Witness heartbeat resource (3)
witnesshb
A module uses the Witness server to monitor whether or not servers are active.
For an interconnect with the highest priority, configure kernel mode LAN heartbeat resources which can be exchanged between all servers.
Configuring at least two kernel mode LAN heartbeat resources is recommended unless it is difficult to add a network to an environment such as the cloud or a remote cluster.
It is recommended to register both an interconnect-dedicated LAN and a public LAN as LAN heartbeat resources.
Network partitioning refers to the status where all communication channels have problems and the network between servers is partitioned.
In a cluster system that is not equipped with solutions for network partitioning, a failure on a communication channel cannot be distinguished from an error on a server. This can cause data corruption brought by access from multiple servers to the same resource. EXPRESSCLUSTER, on the other hand, distinguishes a failure on a server from network partitioning when the heartbeat from a server is lost. If the lack of heartbeat is determined to be caused by the server failure, the system performs a failover by activating each resource and rebooting applications on a server running normally. When the lack of heartbeat is determined to be caused by network partitioning, emergency shutdown is executed because protecting data has higher priority over continuity of the operation. Network partitions can be resolved by the following methods:
PING method
A device that is always active to receive and respond to the ping command (hereafter described as ping device) is required.
More than one ping device can be specified.
When the heartbeat from the other server is lost, but the ping device is responding to the ping command, it is determined that the server without heartbeat has failed and a failover takes place. If there is no response to the ping command, the local server is isolated from the network due to network partitioning, and emergency shutdown takes place. This will allow a server that can communicate with clients to continue operation even if network partitioning occurs.
When the status where no response returns from the ping command on all servers continues before the heartbeat is lost, which is caused by a failure in the ping device, the network partitions cannot be resolved. If the heartbeat is lost in this status, a failover takes place in all servers. Because of this, using this method in a cluster with a shared disk can cause data corruption due to access to a resource from multiple servers.
HTTP method
A Web server that is always active is required.
When the heartbeat from the other server is lost, but there is a response to an HTTP HEAD request, it is determined that the server without heartbeat has failed and a failover takes place. If there is no response to an HTTP HEAD request, it is determined that the local server is isolated from the network due to network partitioning, and an emergency shutdown takes place. This will allow a server that can communicate with clients to continue operation even if network partitioning occurs.
When there remains no response to an HTTP HEAD request before the heartbeat is lost, which is caused by a failure in Web server, the network partitions cannot be resolved. If the heartbeat is lost in this status, emergency shutdowns occur in all the servers.
DISK method
Available to a cluster that uses a shared disk.
A dedicated disk partition (disk heartbeat partition) is required on the shared disk.
Network partitioning is determined by writing data periodically on a shared disk and calculating the last existing time of the other server.
If the heartbeat from other server is lost while there is any failure in the shared disk or channel to the shared disk (such as SCSI bus), resolving network partitions fails, which means failover does not take place. In this case, emergency shutdown takes place in servers working properly.
If failures occur on all network channels while the shared disk is working properly, a network partition is detected. Then failover takes place in the master server and a server that can communicate with the master server. Emergent shutdown takes place in the rest of servers.
Compared to the other methods, the time needed to resolve network partitions is longer in the shared disk method because the delay of the disk I/O must be taken into account. The time is about twice as long as the heartbeat time-out and disk I/O wait time.
If the I/O time to the shared disk is longer than the disk I/O wait time, the resolving network may time out, and failover may not take place.
Note
Shared DISK method cannot be used if VERITAS Storage Foundation is used.
PING + DISK method
This is a method that the PING method and the DISK are combined.
This method requires a device (a ping device) that can always receive the ping command and return response. You can specify more than one ping device. This method also requires the dedicated disk partition (disk heartbeat partition) on the shared disk.
This method usually works in the same way as the PING method. However, if the state where a response to the ping command on all servers does not return continues, due to a failure of the ping device before the heartbeat is lost, the method is switched to the DISK method. If the servers using the NP resolution resources of the PING method and those using the NP resolution resources of the DISK method do not match (such as when the PING method resources are used by all servers, but the DISK method resources are used only by some servers connected to a shared disk), the resources of these two types work independently. Therefore, the DISK method works as well, regardless of the state of the ping device.
If the heartbeat from the other server is lost while there is a failure in the shared disk and/or a path to the shared disk, emergency shutdown takes place even if there is response to the ping command.
Majority method
This method can be used in a cluster with three or more nodes.
This method prevents data corruption caused by the Split Brain syndrome by shutting down a server that can no longer communicate with the majority of the servers in the entire cluster because of network failure. When communication with exactly half of the servers in the entire cluster is failing, emergency shutdown takes place in a server that cannot communicate with the master server.
When more than half of the servers are down, the rest servers running properly also go down.
If all servers are isolated due to a hub error, all servers go down.
Not solving the network partition
This method can be selected in a cluster that does not use any disk resource (a shared disk).
If a failure occurs on all network channels between servers in a cluster, all servers failover.
The following are the recommended methods to resolve the network partition:
The ping + shared disk method is recommended for a cluster that uses a shared disk with three or more nodes. When using the hybrid type, use the PING + DISK method for the servers connected to the DISK, and use only the PING method for the servers not connected to the shared disk.
The PING method is recommended for a cluster with three or more nodes but without a shared disk.
The DISK method or the PING + DISK method is recommended for a cluster that uses a shared disk with two nodes.
The PING method is recommended for a cluster with two nodes but without a shared disk.
The HTTP method is recommended for a cluster that uses the Witness heartbeat resource but does not use a shared disk.
Method to resolve a network partition
Number of nodes
Required hardware
Circumstance where failover cannot be performed
When all network channels are disconnected
Circumstance where both servers fail over
Time required to resolve network partition
DISK
No limit
Shared disk
Disk error
The master server survives
None
Time calculated by the heartbeat timeout and disk I/O wait time is needed
PING
No limit
Device to receive the ping command and return a response
None
Server that responses to the ping command survives
All networks are disconnected after the ping command timeouts the specified times consecutively
0
HTTP
No limit
Web server
Web server failure
A server that can communicate with the Web server survives
None
0
PING +
DISK
No limit
Device to receive the ping command and return response
Shared disk
None
Server responding to the ping command survives
None
0
Majority
3 or more
None
Majority of servers go down
A server that can communicate with majority of servers survives
4.1. Steps from Installing EXPRESSCLUSTER to creating a cluster
The following describes the steps from installing EXPRESSCLUSTER, license registration, cluster system creation, to verifying the cluster system status.
Install the EXPRESSCLUSTER Server, which is the core EXPRESSCLUSTER module, to each server that constitutes a cluster. When installing the Server, a license registration is performed as well. (See "4.2.Installing the EXPRESSCLUSTER Server.")
Reboot the server
Create the cluster configuration data using Cluster WebUI
Install the EXPRESSCLUSTER Server, which is an EXPRESSCLUSTER module, on each server machine constituting a cluster system.
License registration is required in installing the Server. Make sure to have the required license file or license sheet.
The EXPRESSCLUSTER Server consists of the following system services:
Service Display Name
Service Name
Description
Startup Type
Service Status (usual)
EXPRESSCLUSTER
clpstartup
EXPRESSCLUSTER
Automatic
Running
EXPRESSCLUSTER API
clprstd
Control of the EXPRESSCLUSTER RESTful API
Automatic
Stopped
EXPRESSCLUSTER Disk Agent
clpdiskagent
Shared disk, mirror disk, hybrid disk control
Manual
Running
EXPRESSCLUSTER Event
clpevent
Event log output
Automatic
Running
EXPRESSCLUSTER Information Base
clpibsv
Cluster information management
Automatic
Running
EXPRESSCLUSTER Java Resource Agent
clpjra
Java Resource Agent
Manual
Stopped
EXPRESSCLUSTER Manager
clpwebmgr
WebManager Server
Automatic
Running
EXPRESSCLUSTER Node Manager
clpnm
Control of heartbeat and network partition resolution
Automatic
Running
EXPRESSCLUSTER Server
clppm
EXPRESSCLUSTER Server
Automatic
Running
EXPRESSCLUSTER System Resource Agent
clpsra
System Resource Agent
Manual
Running
EXPRESSCLUSTER Transaction
clptrnsv
Communication process
Automatic
Running
EXPRESSCLUSTER Web Alert
clpwebalt
Alert synchronization
Automatic
Running
Note
The status of EXPRESSCLUSTER Java Resource Agent will be "Running" when JVM monitor resource is set.
Note
The status of EXPRESSCLUSTER System Resource Agent will be "Running" When the system monitor resource or the process resource monitor resource is set or Collect the System Resource Information is checked on the Monitor tab in Cluster Properties.
4.2.1. Installing the EXPRESSCLUSTER Server for the first time
Install the EXPRESSCLUSTER X on all servers that constitute the cluster by following the procedures below.
Important
When a shared disk is used, make sure not to start more than one OS on servers connected to the shared disk before installing EXPRESSCLUSTER. Data on the shared disk may be corrupted.
Note
Install the EXPRESSCLUSTER Server using Administrator account.
Note
When installing EXPRESSCLUSTER server, Windows media sense function which is the function to deactivate IP address due to disconnection of the cable at link down occurrence will be disabled.
Note
If the Windows SNMP Service has already been installed, the SNMP linkage function will be automatically set up when the EXPRESSCLUSTER Server is installed. If, however, the Windows SNMP Service has not yet been installed, the SNMP linkage function will not be set up.
Insert the installation DVD-ROM into the DVD-ROM drive.
After the menu window is displayed, select EXPRESSCLUSTER for Windows.
Note
If the menu window does not open automatically, double-click the menu.exe in the root folder of the DVD-ROM.
Select EXPRESSCLUSTER X 5.3 for Windows.
The NEC EXPRESSCLUSTER Setup window is displayed. Click Next.
The Choose Destination Location dialog box is displayed. When changing the install destination, click Browse to select a directory.
In the Ready to Install the Program window, click Install to start installing.
After the installation is completed, click Next without changing the default value in Port Number.
Note
The port number configured here needs to be configured again when creating the cluster configuration data. For details on port number, refer to "Parameter details" in the "Reference Guide".
In Filter Settings of Shared Disk, right-click SCSI controller or HBA connected to a shared disk, and click Filtering. Click Next.
Important
When a shared disk is used, configure filtering settings to the SCSI controller or HBA to be connected to the shared disk. If the shared disk is connected without configuring filtering settings, data on the shared disk may be corrupted. When the disk path is duplicated, it is necessary to configure the filter for all the HBAs physically connected with the shared disk though it may look the shared disk is connected to only one HBA.
Important
The filtering settings of the shared disk configured as above are temporary. After rebooting the OS, create cluster configuration data on Cluster WebUI. Then, for the following message displayed when applying the configured settings, make sure to select Yes:
"Do you want to apply the HBA filtering settings to the cluster configuration data?"
Important
When using mirror disk resources, do not perform filtering settings for SCSI controller/HBA which an internal disk for the mirroring target is connected. If the filter is activated on mirror disk resources, starting mirror disk resources fails. However, it is essential to perform filtering settings when shared disks are expected to consist mirroring.
The window that shows the completion of setting is displayed. Click Yes.
License Manager is displayed. Click Register to register the license. For detailed information on the registration procedure, refer to "5.Registering the license" in this guide.
Click Finish to close the License Manager dialog box.
The Complete InstallShield Wizard dialog box is displayed. Select Restarting and click Finish. The server will be rebooted.
Note
When a shared disk is used, it cannot be accessed due to access restriction after OS reboot.
4.2.2. Installing the EXPRESSCLUSTER Server in Silent Mode
In silent mode, the EXPRESSCLUSTER Server is installed automatically without displaying any dialog box to prompt a user to response while the installer is running. This installation function is useful when the installation folder and installation options for all server machines are the same. This function not only eliminates the user's effort but also prevents wrong installation due to wrong specifications.
Install the EXPRESSCLUSTER Server in all servers configuring the cluster by following the procedure below.
Note
Installation in silent mode is not available for a shared disk configuration. For a shared disk configuration, install the EXPRESSCLUSTER Server by referring to "Installing the EXPRESSCLUSTER Server for the first time."
Note
Install the EXPRESSCLUSTER Server using Administrator account.
Note
When installing EXPRESSCLUSTER server, Windows media sense function which is the function to deactivate IP address due to disconnection of the cable at link down occurrence will be disabled.
Note
If the Windows SNMP Service has already been installed, the SNMP linkage function will be automatically set up when the EXPRESSCLUSTER Server is installed. If, however, the Windows SNMP Service has not yet been installed, the SNMP linkage function will not be set up.
If you want to change the installation folder (default: C:\ProgramFiles\EXPRESSCLUSTER), create a response file in advance following the procedure below.
Copy the response file from the installation DVD-ROM to any accessible location in the server.
Copy the following file in the installation DVD-ROM.
Execute the following command from the command prompt to start setup.
# "<Path of silent-install.bat>silent-install.bat" -i <Path of response file>
* <Path of silent-install.bat>:
Windows\5.3\common\server\x64\silent-install.bat
in the installation DVD-ROM.
* When installing the EXPRESSCLUSTER Server in the default directory (C:\ProgramFiles\EXPRESSCLUSTER), omit <Path of response file>.
Restart the server.
Execute the following command from the command prompt to register the license.
# "<Installation folder>\bin\clplcnsc.exe" -i <Path of license file>
4.2.3. Upgrading EXPRESSCLUSTER Server from the previous version
Before starting the upgrade, read the following notes.
The upgrade procedure described in this section is valid for EXPRESSCLUSTER X 3.3 for Windows (internal version 11.35) or later.
In EXPRESSCLUSTER X 4.2 for Windows or later, port numbers for EXPRESSCLUSTER have been added. If you upgrade from EXPRESSCLUSTER X 4.1 for Windows or earlier, make necessary ports accessible beforehand.
If mirror disk resources or hybrid disk resources are set, cluster partitions require space of 1024MiB or larger. And also, executing full copy of mirror disk resources or hybrid disk resources is required.
EXPRESSCLUSTER Server must be upgraded with the account having the Administrator's privilege.
See also
For the procedure of updating between the different versions of the same major version, refer to the "Update Procedure Manual".
The following procedures explain how to upgrade from EXPRESSCLUSTER X 3.3 or 4.x to EXPRESSCLUSTER X 5.3.
Before upgrading, confirm that the servers in the cluster and all the resources are in normal status by using Cluster WebUI, WebManager or the command.
Install the EXPRESSCLUSTER X 5.3 on the server from which was uninstalled old version of the EXPRESSCLUSTER server in the step 5, and then register the license as necessary. For details about how to install the EXPRESSCLUSTER Server, refer to "4.2.Installing the EXPRESSCLUSTER Server" in "4.Installing EXPRESSCLUSTER" in this guide.
Shut down the server on which was installed the EXPRESSCLUSTER X 5.3 in the step 6.
Perform the steps 5 to 7 on each server.
Start all the servers.
In any of the servers with EXPRESSCLUSTER installed as above, execute the command for converting cluster configuration data.
Move to the work directory (such as C:\tmp) in which the conversion command is to be executed.
To the moved work directory, copy and deploy the cluster configuration data backed up in step 2.
Deploy clp.conf and the scripts directory.
Note
If backed up on Cluster WebUI, the cluster configuration data is zipped.
Unzip the file, and clp.conf and the scripts directory will be extracted.
Execute the following command to convert the cluster configuration data:
# clpcfconv.bat -i .
Under the work directory, zip the cluster configuration data (clp.conf) and the scripts directory.
Note
Create the zip file so that when unzipped, the clp.conf file and scripts directory are created.
Open the config mode of Cluster WebUI, and click Import.
Import the cluster configuration data zipped in step 10.
Of the cluster configuration data, manually update its items if necessary.
If you upgrade from EXPRESSCLUSTER X 3.3 and are using mirror disk resources or hybrid disk resources, perform the following steps:
Allocate cluster partition (The cluster partition should be 1024MiB or larger).
If the drive letter of the cluster partition is different from the configuration, modify the configuration. And regarding the groups which mirror disk resources or hybrid disk resources belong to, if Startup Attribute is Auto Startup on the Attribute tab of Group Properties, change it to Manual Startup.
In addition, select Cluster Properties -> the Extension tab, then change the setting of Failover Count Method to Cluster, which is the same unit as before the upgrade.
If the forced stop function or the forced stop script is used, perform the following steps:
Set the Type of Forced Stop on the Fencing tab of Cluster Properties.
If the forced stop script is used: Select Custom.
If the forced stop script is not used: With EXPRESSCLUSTER operated on physical machines, select BMC; with EXPRESSCLUSTER operated on virtual machines, select vCenter.
Click Properties to display the properties window for the forced stop resource, and set each parameter.
Click Apply the Configuration File of the Cluster WebUI to apply the configuration data.
When the message "The disk information in the cluster configuration data differs from that in the server. Do you want the inconsistency to be automatically corrected?" appears, select Yes.
If the fixed-term license is used, run the following command.
clplcnsc--distribute
Open the operation mode of Cluster WebUI, and start the cluster.
If you upgrade from EXPRESSCLUSTER X 3.3 and are using mirror disk resources or hybrid disk resources, perform the following steps:
From the mirror disk list, execute a full copy assuming that the server with the latest data is the copy source.
Start the group and confirm that each resource starts normally.
If Startup Attribute was changed from Auto Startup to Manual Startup in step 13, use the config mode of Cluster WebUI to change this to Auto Startup. Then, click Apply the Configuration File to apply the cluster configuration data to the cluster.
This completes the procedure for upgrading the EXPRESSCLUSTER Server. Check that the servers are operating normally as the cluster by the clpstat command or Cluster WebUI
4.2.4. Setting up the SNMP linkage function manually
Note
If you are using only the SNMP trap transmission function, you do not need to perform this procedure.
To handle information acquisition requests on SNMP, the Windows SNMP Service must be installed separately and the SNMP linkage function must be registered separately.
If the Windows SNMP Service has already been installed, the SNMP linkage function will be automatically registered when the EXPRESSCLUSTER Server is installed. If, however, the Windows SNMP Service has not been installed, the SNMP linkage function will not be registered.
When the Windows SNMP Service has not been installed, follow the procedure below to manually register the SNMP linkage function.
Note
Use an Administrator account to perform the registration.
Install the Windows SNMP Service.
Stop the Windows SNMP Service.
Register the SNMP linkage function of EXPRESSCLUSTER with the Windows SNMP Service.
For the following node licenses of the EXPRESSCLUSTER, register the license to each cluster server.
Main Products
EXPRESSCLUSTER X 5.3 for Windows VM
EXPRESSCLUSTER X SingleServerSafe 5.3 for Windows VM
EXPRESSCLUSTER X SingleServerSafe for Windows VM Upgrade
Optional Products
EXPRESSCLUSTER X Replicator 5.3 for Windows
EXPRESSCLUSTER X Replicator DR 5.3 for Windows
EXPRESSCLUSTER X Replicator DR 5.3 Upgrade for Windows
EXPRESSCLUSTER X Database Agent 5.3 for Windows
EXPRESSCLUSTER X Internet Server Agent 5.3 for Windows
EXPRESSCLUSTER X Application Server Agent 5.3 for Windows
EXPRESSCLUSTER X Java Resource Agent 5.3 for Windows
EXPRESSCLUSTER X System Resource Agent 5.3 for Windows
EXPRESSCLUSTER X Alert Service 5.3 for Windows
Note
If the licenses for optional products have not been installed, the resources and monitor resources corresponding to those licenses are not shown in the list on the Cluster WebUI.
There are two ways of license registration; specifying the license file and using the information on the license sheet.
After registration of the CPU license on the master server, Cluster WebUI on the master server must be used in order to edit and reflect the cluster configuration data as described in "6.Creating the cluster configuration data".
5.1.4. Registering the license by entering the license information
The following describes how to register the license by specifying the license.
Before you register the license, make sure that:
EXPRESSCLUSTER CPU license
You have the license sheet you officially obtained from the sales agent. The values on this license sheet are used for registration.
You have the administrator privileges to log in the server intended to be used as master server in the cluster.
EXPRESSCLUSTER node license
You have the license sheet you officially obtained from the sales agent. The number of license sheets you need is as many as the number of servers on which the product will be used. The values on this license sheet are used for registration.
You have the administrator privileges to log in the server on which you intend to use the product.
On the Start menu, click License Manager of the EXPRESSCLUSTER Server.
In the License Manager dialog box, click Register.
In the window to select a license method, select Register with License Information.
In the Product selection dialog box, select the product category, and click Next.
In the License Key Entry dialog box, enter the serial number and license key of the license sheet. Click Next.
Confirm what you have entered on the License Registration Confirmation dialog box. Click Next.
Make sure that the pop-up message, "The license was registered." is displayed. If the license registration fails, start again from the step 2.
5.1.5. Registering the license by specifying the license file
The following describes how to register the license by specifying the license.
Before you register the license, check that:
EXPRESSCLUSTER CPU license
You have the administrator privileges to log in the server intended to be used as master server in the cluster.
The license file is located in the server intended to be used as master server in the cluster.
EXPRESSCLUSTER node license
You have the administrator privileges to log in the server on which you intend to use the product.
The license file is located in the server in which you intend to use products among servers that constitute a cluster system.
On the Start menu, click License Manager of the EXPRESSCLUSTER Server.
In the License Manager dialog box, click Register.
In the window to select a license method is displayed, select Register with License File.
In the License File Specification dialog box, select the license file to be registered and then click Open.
The message confirming registration of the license is displayed. Click OK.
EXPRESSCLUSTER licenses can be registered during installation, as well as be added or deleted after installation.
Use the fixed term license to operate the cluster system which you intend to construct for a limited period of time.
This license becomes effective on the date when the license is registered and then will be effective for a certain period of time.
In preparation for the expiration, the license for the same product can be registered multiple times. Extra licenses are saved and a new license will take effect when the current license expires.
The fixed term license applies to the EXPRESSCLUSTER X 5.3 for Windows and optional products as shown below. Among servers that constitute the cluster, use the master server to register the fixed term license.
Main Products
EXPRESSCLUSTER X 5.3 for Windows VM
Optional Products
EXPRESSCLUSTER X Replicator 5.3 for Windows
EXPRESSCLUSTER X Replicator DR 5.3 for Windows
EXPRESSCLUSTER X Database Agent 5.3 for Windows
EXPRESSCLUSTER X Internet Server Agent 5.3 for Windows
EXPRESSCLUSTER X Application Server Agent 5.3 for Windows
EXPRESSCLUSTER X Java Resource Agent 5.3 for Windows
EXPRESSCLUSTER X System Resource Agent 5.3 for Windows
EXPRESSCLUSTER X Alert Service 5.3 for Windows
Note
If the licenses for optional products have not been installed, the resources and monitor resources corresponding to those licenses are not shown in the list on the Cluster WebUI.
A License is registered by specifying the license file.
Notes on using the fixed term license are as follows:
The fixed term license cannot be registered to several of the servers constituting the cluster to operate them.
After registration of the license on the master server, Cluster WebUI on the master server must be used in order to edit and reflect the cluster configuration data as described in "6.Creating the cluster configuration data".
The number of the fixed term license must be larger than the number of the servers constituting the cluster.
After starting the operation of the cluster, additional fixed term license must be registered in the master server.
Once enabled, the fixed term license cannot be reregistered despite its validity through the license/server removal or the server replacement.
5.3.2. Registering the fixed term license by specifying the license file
The following describes how you register a fixed term license.
Before you register the license, check that:
You have the administrator privileges to log in the server intended to be used as master server in the cluster.
The license files for all the products you intend to use are stored in the server that will be set as a master server among servers that constitute the cluster system.
Follow the following steps to register all the license files for the products to be used. If you have two or more license files for the same product in preparation for the expiration, execute the command to register the extra license files in the same way as the following steps.
On the Start menu, click License Manager of the EXPRESSCLUSTER Server.
In the License Manager dialog box, click Register.
In the window to select a license method is displayed, select Register with License File.
In the License File Specification dialog box, select the license file to be registered and then click Open.
The message confirming registration of the license is displayed. Click OK.
Click Finishto close the license manager.
Note
The license will be in an "inactive" state if the cluster is not yet created, but there is no problem because it will be activated once the cluster creation is completed.
5.4. Referring and/or deleting the fixed term license
5.4.1. How to refer to and/or delete the registered fixed term license
In EXPRESSCLUSTER, data that contains information on how a cluster system is configured is called "cluster configuration data."This data is created using the Cluster WebUI. This chapter provides the information on how to start the Cluster WebUI and the procedures to create the cluster configuration data using the Cluster WebUI with a sample cluster configuration.
Creating the cluster configuration data is performed by using the config mode of Cluster WebUI, the function for creating and modifying cluster configuration data.
Start the Cluster WebUI accessed from the management PC and create the cluster configuration data. The cluster configuration data will be applied in the cluster system by the Cluster WebUI.
Accessing to the Cluster WebUI is required to create cluster configuration data. This section describes the overview of the Cluster WebUI, and how to create cluster configuration data.
The Cluster WebUI is a function for setting up the cluster, monitoring its status, starting up or stopping servers and groups, and collecting cluster operation logs through a Web browser. The overview of the Cluster WebUI is shown in the following figures.
This figure shows two servers with EXPRESSCLUSTER installed. You can display the Cluster WebUI screen, by using a Web browser on the Management PC to access one of the servers.
For this access, specify the management group's floating IP (FIP) address or virtual IP (VIP) address.
Specify the floating IP address or virtual IP address for accessing Cluster WebUI for the URL when connecting from a Web browser of the management PC.These addresses are registered as the resources of the management group. When the management group does not exist, you can specify the address of one of servers configuring the cluster (fixed address allocated to the server) to connect management PC with the server. In this case, the Cluster WebUI cannot acquire the status of the cluster if the server to be connected is not working.
Before you create the cluster configuration data using Cluster generation wizard, check values you are going to enter. Write down the values to see whether your cluster is efficiently configured and there is no missing information.
As shown in the below, this chapter uses a typical cluster configuration with two nodes and the hybrid disk configuration with three nodes.
When a shared disk with two nodes is used:
Fig. 6.2 Example of a 2-node cluster with a shared disk
FIP1
10.0.0.11
(to be accessed by Cluster WebUI clients)
FIP2
10.0.0.12
(to be accessed by operation clients)
NIC1-1
192.168.0.1
NIC1-2
10.0.0.1
NIC2-1
192.168.0.2
NIC2-2
10.0.0.2
Shared disk
Drive letter of the disk heartbeat
E
File system
RAW
Drive letter of the switchable partition
F
File system
NTFS
When mirroring disks with two nodes are used:
Fig. 6.3 Example of a 2-node cluster with mirror disks
FIP1
10.0.0.11
(to be accessed by Cluster WebUI clients)
FIP2
10.0.0.12
(to be accessed by operation clients)
NIC1-1
192.168.0.1
NIC1-2
10.0.0.1
NIC2-1
192.168.0.2
NIC2-2
10.0.0.2
Drive letter of the cluster partition
E
File system
RAW
Drive letter of the data partition
F
File system
NTFS
When mirror disk resources with remotely-constructed two nodes are used:
This configuration is an example for a layer-2 WAN, on which the same network address can be used between the locations.
Fig. 6.4 Example of a 2-node cluster with a remote configuration using mirror disk resources
FIP1
10.0.0.11
(to be accessed by Cluster WebUI clients)
FIP2
10.0.0.12
(to be accessed by operation clients)
NIC1
10.0.0.1
NIC2
10.0.0.2
Drive letter of the cluster partition
E
File system
RAW
Drive letter of the data partition
F
File system
NTFS
When hybrid disks with three nodes are used:
Fig. 6.5 Example of a 3-node cluster with hybrid disks
FIP1
10.0.0.11
(to be accessed by Cluster WebUI clients)
FIP2
10.0.0.12
(to be accessed by operation clients)
NIC1-1
192.168.0.1
NIC1-2
10.0.0.1
NIC2-1
192.168.0.2
NIC2-2
10.0.0.2
NIC3-1
192.168.0.3
NIC3-2
10.0.0.3
Shared disk
Drive letter of the partition for heartbeat
E
File system
RAW
Drive letter of the cluster partition
F
File system
RAW
Drive letter of the data partition
G
File system
NTFS
Disk
Drive letter of the cluster partition
F
File system
RAW
Drive letter of the data partition
G
File system
NTFS
The following table lists sample values of the cluster configuration data to achieve the cluster system shown above. The step-by-step instruction for creating the cluster configuration data with these values is provided in the following sections. When you actually set the values, you may need to modify them according to the cluster you are intending to create. For information on how you determine the values, refer to the Referenced Guide.
Third monitor resources(Auto creation after hybrid disk resource is created)
Type
Hybrid disk monitor
Monitor resource name
hdw1
Hybrid disk resource
hd1
Recovery target
failover1
Final action
None
Fourth monitor resources(Auto creation after hybrid disk resource is created)
Type
Hybrid disk TUR monitor
Monitor resource name
hdtw1
Hybrid disk resource
hd1
Recovery target
failover1
Final action
None
Fifth monitor resources(Automatically created after the creation of ManagementIP resources)
Type
floating ip monitor
Monitor resource name
fipw1
Monitor target
ManagementIP
Recovery target
ManagementIP
Sixth monitor resource(Automatically created after the creation of fip1 resources)
Type
floating ip monitor
Monitor resource name
fipw2
Monitor target
fip1
Recovery target
fip1
Seventh monitor resource
Type
IP monitor
Monitor resource name
ipw1
Monitor IP address
10.0.0.254 (gateway)
Recovery target
All Groups
Eighth monitor resources (Automatically created after the creation of application resources when the application resources are of resident type)
Type
Application monitor
Monitor resource name
appliw1
Target resource
appli1
Recovery target
failover1appli1
6.4. Procedure for creating the cluster configuration data
Creating the cluster configuration data involves creating a cluster, group resources, and monitor resources. Use the cluster creation wizard to create new configuration data. The procedure is described below.
Note
The created cluster configuration data can be modified later by using the rename function or properties view function.
On the Cluster window in Cluster generation wizard, click Language field to select the language to be used by the OS.
Note
Only one language can be used in one cluster. When the OS with multi languages is used in a cluster, specify "English."
Enter the cluster name in the Cluster Name box.
Enter the floating IP address (192.168.0.11) used to connect the Cluster WebUI in the Management IP Address box. Click Next.
The Basic Settings window for the server window is displayed. The server (server1) for which the IP address was specified as the URL when starting up the Cluster WebUI is registered in the list.
Add the second and subsequent servers to the cluster.
In Server Definitions, click Add.
The Add Server dialog box is displayed. Enter the server name, FQDN name, or IP address of the second server, and then click OK. The second server (server2) is added to the Server Definitions.
For the hybrid disk configuration, add the third server (server3) in the same way.
For the hybrid disk configuration, follow the procedure in "1-3 Create a server group."
Set up the network configuration between the servers in the cluster.
Add or delete them by using Add or Delete, click a cell in each server column, and then select or enter the IP address. For a communication route to which some servers are not connected, leave the cells for the unconnected servers blank.
For a communication route used for heartbeat transmission (interconnect), click a cell in the Type column, and then select Kernel Mode. When using only for the data mirroring communication of the mirror disk resource or the hybrid disk resource and not using for the heartbeat, select Mirror Communication Only.
At least one communication route must be specified for the interconnect. Specify as many communication routes for the interconnect as possible.
If multiple interconnects are set up, the communication route for which the Priority column contains the smallest number is used preferentially for internal communication between the servers in the cluster. To change the priority, change the order of communication routes by selecting arrows.
When using Witness heartbeat, click a cell in the Type column, and select Witness. Next, click Properties, and enter the address of Witness server for Target Host. Then enter the port number for Service Port. For servers that do not use Witness heartbeat, click the cells of those servers, and select Do Not Use.
For a communication route used for data mirroring communication for mirror disk resources or hybrid disk resources, click a cell of the MDC column, and then select the mirror disk connect name (mdc1 to mdc16) assigned to the communication route. Select Do Not Use for communication routes not used for data mirroring communication.
To use NP resolution in the DISK mode, click Add and add a row to NP Resolution List, click Type and select DISK, and then, click the cell of each server and select the disk drive to be used for the partition for disk heartbeat. If there are any servers that are not connected to the shared disk, make the cells of the servers blank.
For the setup example in this chapter, add a DISK mode row and click the column of each server, and then select the E: drive to use the shared disk. To use a hybrid disk, add a DISK mode column, click the cells of server1 and server2, and then select the E: drive. Make the server3 cell blank.
To use NP resolution in the PING mode, click Add and add a row to NP Resolution List, click Type and select Ping, click the cell of Target, and enter the IP addresses of the ping destination target devices (such as a gateway). When multiple IP addresses separated by commas are entered, they are regarded as isolated from the network if there is no ping response from any of them.
If the PING mode is used only on some servers, set the cell of the server not to be used to Do Not Use.
For the setup example in this chapter, a row for the PING mode is added and 192.168.0.254 is specified for Target.
To use NP resolution in the HTTP mode, add a row to NP Resolution List by clicking Add, click the cell in Type column, and select HTTP. Then click Properties, enter the address of the Web server in Target Host, and enter the port number in Service Port. If the HTTP mode is used only on some servers, set the cells of the servers not to be used to Do Not Use.
For the setup example in this chapter, the HTTP mode is not used.
To use the majority method for NP resolution, click Add and add a row to NP Resolution List, click the cell of Type column, and then select Majority.
For the setup example in this chapter, the majority method is not used.
Set up a group that works as a unit of failover at the time an error occurs.
Click Add in the Group List to open the Group Definition dialog box.
For the setup example in this chapter, select Use Server Group Settings checkbox to use a hybrid disk. Enter the group name (failover1) in the Name box, and click Next.
Specify a server on which the failover group can start up. For the setup example in this chapter, to use the shared disk or the mirror disk, select the Failover is possible at all servers check box or add server1 and then server2 from the Available Servers and add them to the Servers that can run the Group. To use the hybrid disk, add svg1 and then svg2 from the Available Server Groups to the Server Groups that can run the Group
Click Next.
Specify each attribute value of the failover group. Because all the default values are used for the setup example in this chapter, click Next.
The Group Resource List is displayed.
6.4.2.2. Add a group resource (Floating IP resource)
Add a group resource, a configuration element of the group, to the failover group you have created in Step 2-1.
Click Add in the Group Resource List.
The Resource Definition of Group | failover1 dialog box is displayed. In the Resource Definition of Group(failover1) dialog box, select the group resource type Floating IP resource in the Type box, and enter the group resource name fip1 in the Name box. Click Next.
The Dependent Resources page is displayed. Specify nothing. Click Next.
The Recovery Operation at Activation Failure Detection and Recovery Operation at Deactivation Failure Detection pages are displayed. Click Next.
Enter IP address (10.0.0.12) to IP Address box. Click Finish.
The floating IP resource is added to Group Resource List.
6.4.2.3. Add a group resource (Disk resource/Mirror disk resource/Hybrid disk resource)
When using a shared disk
Add a shared disk as a group resource.
Click Add in Group Resource List.
The Resource Definition of Group | failover1 dialog box is displayed. In the Resource Definition of Group | failover1 dialog box, select the group resource type disk resource in the Type box, and enter the group resource name sd1 in the Name box. Click Next.
The Dependent Resources page is displayed. Specify nothing. Click Next.
The Recovery Operation at Activation Failure Detection and Recovery Operation at Deactivation Failure Detection pages are displayed. Click Next.
Select server1 in the Servers that can run the Group. Click Add.
The Selection of partition dialog box is displayed. Select the partition F:. Click OK.
Important
For disk resource partition, specify an unformatted partition on the shared disk that is connected to the filtering-configured HBA.
Make sure not to specify the disk resource partition to partition for disk heartbeat partition, or cluster partition or data partition for mirror disk resource. Data on the shared disk may be corrupted.
Similarly, add server2 to Servers that can run the Group, and click Finish.
The disk resource is added to Group Resource List.
When using a mirror disk
Add a mirror disk as a group resource.
Click Add in Group Resource List.
The Resource Definition of Group | failover1 dialog box is displayed. In the Resource Definition of Group | failover1 dialog box, select the group resource type mirror disk resource in the Type box, and enter the group resource name md1 in the Name box. Click Next.
The Dependent Resources page is displayed. Specify nothing. Click Next.
The Recovery Operation at Activation Failure Detection and Recovery Operation at Deactivation Failure Detection pages are displayed. Click Next.
Select server1 in the Servers that can run the Group. Click Add.
The Selection of partition dialog box is displayed. In the Selection of Partition dialog box, click Connect, and then, select a data partition F: and cluster partition E:. Click OK.
Important
Specify different partitions for data partition and cluster partition. If the same partition is specified, data on the mirror disk may be corrupted. Make sure not to specify a partition on the shared disk for the data partition and cluster partition of mirror disk resource.
Similarly, add server2 to Servers that can run the Group, and click Finish.
The mirror disk resource is added to Group Resource List.
When using a hybrid disk
Add a hybrid disk as a group resource.
Click Add in Group Resource List.
The Resource Definition of Group | failover1 dialog box is displayed. In the Resource Definition of Group | failover1 dialog box, select the group resource type hybrid disk resource in the Type box, and enter the group resource name hd1 in the Name box. Click Next.
The Dependent Resources page is displayed. Specify nothing. Click Next.
The Recovery Operation at Activation Failure Detection and Recovery Operation at Deactivation Failure Detection pages are displayed. Click Next.
Enter the drive letter (G:) of the data partition for mirroring in the Data Partition Drive Letter box, the drive letter (F:) of the cluster partition in the Cluster Partition Drive Letter box.
Important
Specify different partitions for data partition and cluster partition. If the same partition is specified, data on the mirror disk may be corrupted.
Click Obtain information. The GUID information of data and cluster partitions on each server is displayed. Click Finish.
The hybrid disk resource is added to Group Resource List.
6.4.2.4. Add a group resource (Application resource)
Add an application resource that can start and stop the application.
Click Add in Group Resource List.
The Resource Definition of Group | failover1 dialog box is displayed. In the Resource Definition of Group | failover1 dialog box, select the group resource type Application resource in the Type box, and enter the group resource name appli1 in the Name box. Click Next.
The Dependent Resources page is displayed. Specify nothing. Click Next.
The Recovery Operation at Activation Failure Detection and Recovery Operation at Deactivation Failure Detection pages are displayed. Click Next.
Select Resident in the Resident Type. Specify the path of the execution file for the Start Path.
Note
For the Start Path and Stop Path, specify an absolute path of the executable file or the name of the executable file of which the path configured with environment variable is effective. Do not specify a relative path. If it is specified, starting up the application resource may fail.
Click Finish.
The application resource is added to Group Resource List.
Add a monitor resource that monitors a specified target to the cluster.
6.4.3.1. Add a monitor resource (Disk RW monitor resource)
Add RW monitor resource to monitor the local disk.
Click Next in Group List.
The Monitor Resource List is displayed. In the Monitor Resource List, click Add. Select the monitor resource type disk RW monitor in the Type box, and enter the monitor resource name diskw1 in the Name box. Click Next.
Enter the monitor settings. Select Always in the Monitor Timing box. Click Next.
Set the file name C:/check.txt and I/O size (2000000). Select Action on Stall (Generate an Intentional Stop Error) and Action When Diskfull Is Detected (Recover), and click Next. For File Name, specify the file of the partition where OS is installed.
Select Execute only the final action in the Recovery Action box.
Select Generate an Intentional Stop Error in the Final Action box, and click Finish.
The disk RW monitor resource diskw1 is added to the Monitor Resource List.
Note
By specifying a file in the local disk for the monitoring target of the disk RW monitoring resource, monitoring can be performed as the local disk monitoring. In such a case, select Generate an Intentional Stop Error for the Final Action.
6.4.3.2. Add a monitor resource (IP monitor resource)
Add monitor resources that monitor IP.
Click Add in the Monitor Resource List dialog box. Select the monitor resource type ip monitor in the Type box, and enter the monitor resource name ipw1 in the Name box. Click Next.
Enter the monitor settings. Change nothing from the default values. Click Next.
Click Add in the IP Addresses. Enter the IP address to be monitored 192.168.0.254 in the IP Address box, and click OK.
Note
For monitoring target of the IP monitor resource, specify the IP address of a device (for example, gateway) that is assumed to be always active on the public LAN.
The IP address you have entered is set in the IP Addresses. Click Next.
Specify the recovery target. Click Browse.
Click All Groups in the tree view and click OK. All Groups is set in the Recovery Target.
Click Finish.
The IP monitor resource ipw1 is added to the Monitor Resource List.
When you click Finish after creating a monitor resource, the following popup message appears:
Clicking No disables automatic group startup, recovery on the activation/deactivation failure of a group resource, recovery on the failure of a monitor resource, and failover on a server crash. To start a cluster for the first time after creating the cluster configuration data, it is recommended to disable the automatic start and the recovery and to check the cluster configuration data for errors.
To disable the cluster operation, go to Cluster properties -> Extension tab -> Disable cluster operation.
Note
Disabling the recovery on the failure of a monitor resource is not applied to the function of detecting the stall of the disk RW monitor resource.
Create cluster configuration information is complete. Proceed to the next section, "6.7.Starting a cluster".
The created configuration data can be saved in a directory on your PC or in external media.
Follow the procedures below to save the cluster configuration.
Click Export in the config mode of Cluster WebUI.
Select a location to save the data and save it.
Note
A zip file containing one file (clp.conf) and one directory (scripts) is saved. If any of these are missing, the command to create a cluster does not run successfully. Make sure to treat these two as a set. When new configuration data is edited, clp.conf.bak is created in addition to these two.
Note
When installing EXPRESSCLUSTER, if the port number different from the default value is specified in Port Number, click Cluster Properties and click Port Number and specify the same values for WebManager HTTP Port Number and Disk Agent Port Number specified at the time of installation before saving the cluster configuration data.
Before applying the cluster configuration data created on Cluster WebUI to the cluster servers, the cluster configuration data can be checked.
In the config mode of Cluster WebUI, click Cluster Configuration Information Check.
After the check is completed, the results are displayed in another window. It may take time for the check to be completed, depending on the settings for the created cluster configuration data.
Details of what is checked are as follows:
Cluster Properties
Check item
Description
Ping check on pingnp
Checks whether ping can reach the ping target for network partition resolution.
Target check for pingnp
This check ensures that the ping target for network partitioning resolution does not overlap with the IP addresses of cluster servers.
Partition presence check for disknp
Checks whether the specified partition exists in the system.
Port No. tab : port number check
Checks whether the range of automatically assigned communication port numbers managed by the OS does not overlap with that used by EXPRESSCLUSTER.
Extension tab: 1st check item for the log storage destination path of settings of log storage period
Check if the specified path is outside the EXPRESSCLUSTER installation path.
Extension tab: 2nd check item for the log storage destination path of settings of log storage period
Check if the specified path exists.
Group Resources
Check item
Description
Ping check on fip
Checks whether the specified IP address is not yet used in the same network, by pinging the IP address.
Ping check on vip
Checks whether the specified IP address is not yet used in the same network, by pinging the IP address.
Partition presence check for sd
Checks whether the specified partition exists in the system.
Partition presence check for md
Checks whether the specified partition exists in the system.
Partition presence check for hd
Checks whether the specified partition exists in the system.
Cluster partition size check for md
Checks whether the size of the specified cluster partition is enough.
Cluster partition size check for hd
Checks whether the size of the specified cluster partition is enough.
Port number check for azurepp
Checks whether the range of automatically assigned communication port numbers managed by the OS does not overlap with that used by EXPRESSCLUSTER.
Port Number Usage Check for lbpp
This check ensures that the configured port number is not in use by other applications.
Heartbeat Resources
Check item
Description
Ping check on khb
Checks whether the IP address specified as a heartbeat resource can be used, by pinging the IP address.
Others
Check item
Description
AWSCLI command execution check
Checks whether the AWS CLI can be run.
OS start time check
Checks whether the time for the OS startup is set longer than the heartbeat timeout.
Unrecommended settings check
Check item
Description
Recovery action check for deactivation failure
Checks whether any setting other than No operation is set for the final action on the deactivation failure of each group resource.
After creating and/or modifying a cluster configuration data, apply the configuration data on the servers that constitute a cluster and create a cluster system.
After creation and modification of the cluster configuration data are completed, create a cluster in the following procedures.
Click Apply the Configuration File in the config mode of Cluster WebUI.
A popup message asking "Do you want to perform the operations?" is displayed. Click OK.
When the upload ends successfully, a popup message saying "The application finished successfully." is displayed. Click OK.
If the upload fails, perform the operations by following the displayed message.
Select the Operation Mode on the drop down menu of the toolbar in Cluster WebUI to switch to the operation mode.
Select Start Cluster in the Status tab of Cluster WebUI and click.
Confirm that a cluster system starts and the status of the cluster is displayed to the Cluster WebUI. If the cluster system does not start normally, take action according to an error message.
For how to operate and check the Cluster WebUI, see the online manual from the button on the upper right of the screen.
Note
When installing EXPRESSCLUSTER, if the port number different from the default value is specified in Port Number, click Cluster Properties and click Port Number and specify the same values for WebManager HTTP Port Number and Disk Agent Port Number specified at the time of installation before saving the cluster configuration data.
Note
When setting the firewall using the clpfwctrl command, it is necessary to execute the command even after applying the cluster configuration data.
7.1. Verifying the status using the Cluster WebUI
This chapter provides instructions for verifying the cluster system by using the Cluster WebUI. The Cluster WebUI is installed at the time of the EXPRESSCLUSTER Server installation. Therefore, it is not necessary to install it separately. The overview of the Cluster WebUI is provided. Then how to verify a cluster by accessing the Cluster WebUI is described.
Follow the steps below to verify the operation of the cluster after creating the cluster and connecting to the Cluster WebUI.
See also
For how to operate Cluster WebUI, see the online manual. If any error is detected while checking the status, troubleshoot the error referring to "Troubleshooting" in the "Reference Guide".
Check heartbeat resources
Check on the Cluster WebUI that the each server has been rebooted and that the heartbeat resource status of each server is normal. Check that no alert or error is recorded in the alert view of the Cluster WebUI.
Check monitor resources
Verify that the status of each monitor resource is normal on the Cluster WebUI.
Start up a group
Start a group.
Check on the Cluster WebUI that the group has been started and that group resources included in the group have been started.
Check that no alert or error is recorded in the alert view of the Cluster WebUI.
Check a disk resource and mirror disk resources/hybrid disk resource
Check that you can access the resource switching partition or data partition on the server where a disk resource/mirror disk resource/hybrid disk is active. Check that you cannot access the resource switching partition or data partition on the server where any resource described above is not active.
Check a floating IP resource
Check that you can ping a floating IP address while the floating IP is active.
Check an application resource
Check that an application is working on the server where an application resource is active.
Check a service resource
Check that a service is working on the server where a service resource is active.
Stop a group
Stop a group.
Verify on the Cluster WebUI that the group has been stopped and that each group resource included in the group has been stopped. Verify that no alert or error is recorded in the alert view of the Cluster WebUI.
Start a group
Start a group.
Verify on the Cluster WebUI that the group has been started.
Move a group
Move a group to another server.
Check on the Cluster WebUI that the group has been started on the moving destination server.
Verify that each group resource has been started successfully and that no alert or error is recorded in the alert view of the Cluster WebUI.
Move the group to all servers included in the failover policy to check above mentioned issue.
Perform failover
Shut down the server where a group is active.
After the heartbeat timeout, check to see the group has failed over. Verify that the status of the group becomes activated on the failover destination server on the Cluster WebUI.
Perform failback
When the automatic failback is set, start the server that you shut down for checking failover. Verify that the group fails back to the original server after it is started. Check on the Cluster WebUI that the status of group becomes activated on the failback destination server.
Note
For groups that include mirror disk resource or hybrid disk resource, auto failback cannot be set because mirror recovery is required.
Check the alert option
When the alert option is set, check that an alert mail message is sent after checking a failover.
Shut down the cluster
Shut down the cluster. Verify that all servers in the cluster are successfully shut down Also, check that all servers start successfully by restarting them. At the same time, check that no alert or error is recorded in the Alert logs of the Cluster WebUI.
Check that the status of each server is activated by using the clpstat command.
Verify that the heartbeat resource status of each server is normal.
Check monitor resources
Verify that the status of each monitor resource is normal by using the clpstat command.
Start groups
Start the groups with the clpgrp command.
Verify that the status of groups is activated by using the clpstat command.
Check a disk resource/mirror disk resource/hybrid disk resource
Check that you can access the resource switching partition or data partition on the server where a disk resource/mirror disk resource/hybrid disk is active. Check that you cannot access the resource switching partition or data partition on the server where any resource described above is not active.
Check a floating IP resource
Verify that you can ping a floating IP address while the IP resource is active.
Check an application resource
Verify that an application is working on the server where the application resource is active.
Check a service resource
Verify that a service is working on the server where the service resource is active.
Stop a group
Stop a group by using the clpgrp command. Check that the group is stopped by using the clpstat command.
Start a group
Start a group by using the clpgrp command. Check that the group is activated by using the clpstat command.
Move a group
Move a group to another server by using the clpstat command.
Verify that the status of the group is activated by using the clpstat command.
Move the group to all servers in the failover policy and verify that the status changes to activated on each server.
Perform failover
Shut down a server where a group is active.
After the heartbeat timeout, check to see the group has failed over by using the clpstat command. Verify that the status of the group becomes activated on the failover destination server using the clpstat command.
Perform failback (When it is set)
When the automatic failback is set, start the server which you shut down in the previous step, "11. Perform failover." Verify that the group fails back to the original server after it is started using the clpstat command. Verify that the status of the group becomes activated on the failback destination server using the clpstat command.
Check the alert option (When it is set)
When the alert option is set, verify that a mail message is sent at failover.
Shut down the cluster
Shut down the cluster by using the clpstdn command. Verify that all servers in the cluster are successfully shut down.
Perform dummy-failure tests, backup, and restoration of the shared disk to verify that the monitor resource can detect errors normally, and that no unexpected errors occur. Also verify that the recovery operations performed when the monitor resource detects an error are performed as intended.
If monitor resources do not detect errors successfully or detect or any stoppage of the server or the OS occurs, the time-out value or other settings need to be adjusted.
Transition of recovery operations due to dummy failure
When Dummy Failure is enabled, a test must be conducted to check that recovery of the monitor resources in which an error was detected is performed as set.
(When the shared disk is RAID-configured and dummy-failure tests can be run)
The test must include error, replacement, and recovery of RAID for the shared disk.
Set a dummy-failure to occur on the shared disk.
Recover RAID from the degenerated state to normal state.
For some shared disk, I/O may temporarily stop or delay when it switches to the degenerated operation or when the RAID is reconfigured.
If any time-out and/or delay occurs in disk rw monitor resource or disk TUR monitor resource, adjust the time-out value of each monitor resource.
Dummy-failure of the paths to shared disks
(When the path to the shared disk is redundant paths and dummy-failure tests can be run.)
The test must include an error in the paths and switching of one path to another.
Set a dummy-failure to occur in the primary path.
It takes time for some path-switching software (driver) to switch the failed path to the path normally working. In some cases, the control may not be returned to the operating system (software).
If any time-out and/or delay occurs in disk rw monitor resource or disk TUR monitor resource, adjust the time-out value of each monitor resource.
Backup/Restoration
If you plan to perform regular backups, run a test backup.
Some backup software and archive commands make CPU and/or disk I/O highly loaded.
If any heartbeat delays, delay in monitor resources, or time-out occur, adjust the heartbeat time-out value and/or time-out value of each monitor resource.
The following describes dummy-failures and what occur by the dummy-failures on a device basis. What occurs varies depending on a system configuration and resource settings. The table in the next page shows the operational examples in the general setting and configuration.
Device
Dummy-failure
What happens:
Disk device SCSI/FC path
Unplug the cable on the active server (for redundant disk cable, unplug both cables)
When the shared disk is monitored, an error is detected, and failover to the standby server occurs. When no disk is monitored, the operation stops.
Deactivation of a disk resource may fail when performing failover.
Unplug the cable on the standby server (for redundancy, unplug both cables)
When the disk TUR monitor resource monitors the disk path on the standby server, an error is detected. The operation continues on the active server.
Unplug the cable of the primary path when the disk path is redundant. (When FC Switch is used, power it off as well.)
Switching of the disk path is performed by the path switching software. No error is detected on the EXPRESSCLUSTER and the operation continues.
In the state of one side path described above, restart the server by moving a group or shutting down the cluster.
The disk path operates in the same way as it is normal.
Degenerate and/or recover the RAID of the disk device.
No error is detected on EXPRESSCLUSTER, and the operation continues.
When the disk device controller is duplicated, stop the one side.
When the path is duplicated, the disk path is switched by the path switching software. No error is detected on EXPRESSCLUSTER, and the operation continues.
When the path is not duplicated and each server is connected directly to the disk, an error is detected by the disk TUR monitor resource on the server connected to the stopped controller, failover to the standby server is performed. (When the controller on the standby server stops, the operation continues.)
Interconnect LAN
Unplug the cable dedicated to LAN
The LAN heartbeat resource on the interconnect becomes offline.
A warning is issued to the alert log.
Communication between servers continues by using a public LAN
= Operation continues.
Public LAN
Unplug the LAN cable or power off the HUB
Communication with the operational client stops, application stalls or an error occurs.
LAN heartbeat resource on the public LAN becomes inactive. A warning is issued to the alert log.
An error is detected when using IP monitor resource and/or NIC Link Up/Down monitor resource. When the cable on the active server is unplugged, a failover occurs. (When HUB is powered off, a failover is repeated up to the largest count configured.
When the public LAN is the only communication channel between servers (such as the remote cluster configuration), emergency shutdown due to the network partition resolving in the ping method takes place in the server where LAN cable is unplugged.
Server UPS
Unplug the power cable of UPS on the active server from outlet
The active server shuts down
Failover to the standby server occurs
UPS on a shared disk
When the power of the shared disk is duplicated, unplug one of the power cables from outlet.
No error is detected on EXPRESSCLUSTER and the operation continues. When UPS supplies the power to one server, the server shuts down. (If it is the active server, failover to the standby server takes place)
LAN for UPS
Unplug the LAN cable
UPS becomes uncontrollable. However, no error is detected on EXPRESSCLUSTER and operation continues.
OS error
Run the shutdown command on the active server
The active server shuts down
Failover to a standby server occurs.
Mirror disk connect
When more than one LAN cable is set up for the mirror disk connect and one or more of them are connected
Unplug only the LAN cable that is being used as the mirror disk connect.
Continue the mirroring operation
When only one LAN cable is set up for the mirror disk connect, or when more than one LAN cable is set up for the mirror disk connect but none of them are connected
Unplug only the LAN cable that is being used as the mirror disk connect.
A warning is issued to the alert log (mirroring stops)
Operation continues on the active server but switching to a standby server becomes impossible.
An error is detected in mirror disk monitor resource/hybrid disk monitor resource.
Disk resource
Start up the disk resource on the server where the disk path is unplugged.
The disk resource does not get activated.
Failover to a standby server occurs.
Application resource
Start up the application resource on the server where the name of the file or folder configured for the start path of the application resource was temporarily changed.
The application resource does not get activated.
Failover to a standby server occurs.
Application monitor resource
Stop a process to be monitored by the task manager.
An error is detected. The application is restarted or a failover to the standby server occurs.
Service resource
Start up the service resource on the server where the path or name of the service's execution file was temporarily changed.
The service resource does not get activated.
Failover to a standby server occurs.
Service monitor resource
Stop a service to be monitored.
An error is detected. The service is restarted or a failover to a standby server occurs.
Floating IP address
Specify the IP address that was set to a floating IP address to a machine in the same segment, and then start up the floating IP address resource.
The floating IP resource does not get activated.
Failover to a standby server occurs. (Activation fails at the failover destination. Failover is repeated up to the largest count configured)
The following is an example of the backup on the uni-directional standby server.
Data in a shared disk and in a local disk is backed up to a backup device connected to the active server (Server 1).
Fig. 8.1 Example of data backup in a uni-directional standby cluster (1)
When an error occurs in Server 1, the data in the shared disk and in the local disk is backed up to a backup device connected to the standby server (Server 2).
Fig. 8.2 Example of data backup in a uni-directional standby cluster (2)
Before you start using your cluster system, check to see your cluster system work properly and make sure you can use the system properly. The operations described below can be executed by using Cluster WebUI or EXPRESSCLUSTER commands. For details of functions of Cluster WebUI, see the online manual. For the details of EXPRESSCLUSTER commands, see "EXPRESSCLUSTER command reference " in the "Reference Guide".The following describes procedures to start up and shut down a cluster and to shut down a server.
To activate a cluster, follow the instructions below:
When you are using any shared or add-in disk, start the disk.
Start all the servers in the cluster.
After cluster activation synchronization between the servers has been confirmed, a cluster is activated on each server. After the cluster has been activated, a group is activated on an appropriate server according to the settings.
Note
When you start all the servers in the cluster, make sure they are started within the duration of time set to Server Sync Wait Time on the Timeout tab of the Cluster Properties in the Cluster WebUI. Be careful that failover occurs if startup of any server fails to be confirmed within the specified time duration.
Note
The shared disk spends a few minutes for initialization after its startup. If a server starts up during the initialization, the shared disk cannot be recognized. Make sure to set servers to start up after the shared disk initialization is completed.
To shut down a cluster or server, use EXPRESSCLUSTER commands or shut down through the Cluster WebUI.
Note
When you are using the Replicator/Replicator DR, mirror break may occur if you do not use any EXPRESSCLUSTER commands or Cluster WebUI to shut down a cluster.
The entire cluster can be shut down by running the clpstdn command, executing cluster shutdown from the Cluster WebUI or performing cluster shutdown from the Start menu. To shut down the entire cluster, wait for all the groups to stop and then terminate each server. By shutting down a cluster, all servers in the cluster can be stopped properly as a cluster system.
Shut down a server by running the clpdown command or executing server shutdown from the Cluster WebUI. Failover occurs when you shut down a server. Mirroring performed by mirror disk resources/hybrid disk resources is interrupted when you are using the Replicator/Replicator DR. If you intend to use a standby server while performing hardware maintenance, shut down the active server.
When you want to update the cluster configuration information, you can stop the EXPRESSCLUSTER service without stopping the current operation. Stopping the EXPRESSCLUSTER in this way is referred to as "suspending". Returning from the suspended status to the normal operation status is referred to as "resuming".
When suspending or resuming a cluster, a request for processing is issued to all the servers in the cluster. Suspending must be executed with the EXPRESSCLUSTER service on all the servers in the cluster being active.
Use EXPRESSCLUSTER commands or Cluster WebUI to suspend or resume a cluster.
When a cluster is suspended, some functions are disabled as described below because the EXPRESSCLUSTER service stops while the active resources are kept active.
All heartbeat resources stop.
All network partition resolution resources stop.
All monitor resources stop.
Groups or group resources are disabled (cannot be started, stopped, or moved).
There are two ways to stop running EXPRESSCLUSTER. One is to stop the service of the EXPRESSCLUSTER Server, and the other is to set the Server service to be manually started.
9.2.1. Stopping the EXPRESSCLUSTER Server service
To stop only the EXPRESSCLUSTER Server service without shutting down the operating system, use the clpcl command or Stopcluster from the Cluster WebUI.
9.2.2. Setting the EXPRESSCLUSTER Server service to be manually activated
To make the EXPRESSCLUSTER Server service not start when the OS starts, make the setting by using the OS service manager so that the Server service is manually started. By doing this, the EXPRESSCLUSTER will not start when the OS is rebooted next time.
9.2.3. Changing the setting of the EXPRESSCLUSTER Server service from the manual startup to automatic startup
The OS service manager is also used to set the EXPRESSCLUSTER Server service to be started automatically. Even you change the settings, the EXPRESSCLUSTER Server service remains stopped until it is directly started up or the server is restarted.
The following describes procedures and precautions for modifying the configuration data after creating a cluster.
9.3.1. Modifying the cluster configuration data by using the Cluster WebUI
Start the Cluster WebUI.
Select the Config Mode icon from the drop down menu of the tool bar in Cluster WebUI.
Modify the configuration data after the current cluster configuration data is displayed.
Upload the modified configuration data. Depending on the data modified, it may become necessary to suspend or stop the cluster and/or to restart by shutting down the cluster. In such a case, uploading is canceled once and the required operation is displayed. Follow the displayed message and do as instructed to perform upload again.
9.3.2. Applying the modified cluster configuration data
To upload the modified cluster configuration data by the Cluster WebUI or the clpcfctrl command, select the operation from the following depending on the modification. For the operation required to apply the modified data, refer to "Parameter details"in the "Reference Guide".
The way you apply the changed data may affect the applications on the system and the behavior of the EXPRESSCLUSTER Server. For details, see the table below:
The way to apply changes
Effect
Upload only
The operation of the EXPRESSCLUSTER Server is not affected. Heartbeat resources, group resources and monitor resource do not stop.
Upload data and then restart the API service
The operation of the EXPRESSCLUSTER Server is not affected. Heartbeat resources, group resources and monitor resource do not stop.
Upload and then restart the WebManager service
The operation of the EXPRESSCLUSTER Server is not affected. Heartbeat resources, group resources and monitor resource do not stop.
Upload data and then restart the Information Base service
The operation of the EXPRESSCLUSTER Server is not affected. Heartbeat resources, group resources and monitor resource do not stop.
Upload data and then restart the Node Manager service
As long as the EXPRESSCLUSTER Node Manager service is stopped, heartbeat resources are also stopped. However, the applications on the system continue to operate because group resources do not stop.
Upload after stopping the group whose setting has been changed
Group resources are stopped. Because of this, the applications on the system that are controlled by the group are stopped until the group is started after uploading.
Upload after suspending the cluster
The EXPRESSCLUSTER is partly stopped.
During the period when the EXPRESSCLUSTER Server service is suspended, heartbeat resources and monitor resources are stopped. Because group resources do not stop, the applications on the system continue to operate.
Upload after stopping the cluster
The EXPRESSCLUSTER totally stops. Groups stop as well. Therefore, the applications used on the system are stopped until data is uploaded and the cluster is started.
Shut down and restart the cluster after uploading the data
The applications used on the system are stopped until the cluster restarts and the group is started.
Note
If the cluster needs to be suspended or stopped to apply the modified data, ensure suspension on stopping is complete before applying the data.
Check if the message on the Cluster WebUI Alert logs shows "Type : Info,Module name: pm, Event ID: 2". For more information on messages, see "Error messages" in the "Reference Guide".
When the Cluster WebUI is not available, check the syslog to see if "Module type: pm, Event type: information, Event ID: 2" is displayed on the event viewer.
After checking the message stated above, apply the cluster configuration data on the EXPRESSCLUSTER environment.
To start Integrated Cluster WebUI, follow these steps:
Start a Web browser.
In the address bar of the browser, enter the IP address and port number of a server where EXPRESSCLUSTER Server is installed, and add integ.html to the URL as follows:
http://ip-address:port/integ.html
ip-address
Specify the actual IP address of a server where EXPRESSCLUSTER Server is installed.
port
Specify the same port number as that of WebManager specified during the installation (default, 29003).
Integrated Cluster WebUI is started.
When the login screen appears, enter a username and password to log in.
9.4.4. Registering a cluster system in Integrated Cluster WebUI
Registering a cluster in Integrated Cluster WebUI also requires choosing OS Authentication Method, a method to log in to Cluster WebUI.
For information on how to register it, see the online help of Integrated Cluster WebUI.
In case of disconnected communication to the IP address entered during the registration, also register an IP address connectable to each server, in IP address for Integrated Cluster WebUI. (See "Reference Guide" -> "Parameter details" -> "Cluster properties" -> "WebManager tab" -> "IP address for Integrated Cluster WebUI".)
You must log on as Administrator when uninstalling the EXPRESSCLUSTER Server. It is recommended to extract configuration information before performing uninstallation. For details, refer to "EXPRESSCLUSTER command reference" in the "Reference Guide".
Follow the procedures below to uninstall the EXPRESSCLUSTER Server:
Switch the type of service startup to manual startup.
clpsvcctrl.bat--disable-a
Shutdown the server.
If the shared disk is used, please unplug all disk cables connected to the server because disk filtering will be disabled after uninstallation.
Turn on the server.
In Control Panel in OS, click Programs and Features.
Select EXPRESSCLUSTER Server, and then click Uninstall.
The EXPRESSCLUSTER Server Setup dialog box is displayed.
Click Yes in the uninstallation confirmation dialog box. If you click No, uninstallation will be canceled.
If the SNMP service is started, the message to confirm to stop the SNMP service is displayed. Click Yes. If you click No, uninstallation will be canceled.
The message asking whether to return the media sense function (TCP/IP disconnection detection) to the state before installing the EXPRESSCLUSTER Server is displayed. Click Yes to return to the state before installing the EXPRESSCLUSTER Server. If you click No, EXPRESSCLUSTER will be uninstalled while media sense function is not effective.
The completion message is displayed when uninstallation is completed in the EXPRESSCLUSTER Server Setup dialog box. Click Finish.
The confirmation message whether to restart the computer is displayed. Select whether to restart the PC and click Finish. Uninstallation of the EXPRESSCLUSTER Server is completed.
Important
If the shared disk is used, make sure not to start the OS while the shared disk is connected after uninstalling EXPRESSCLUSTER. Data on the shared disk may be corrupted.
To reinstall the EXPRESSCLUSTER Server, you have to prepare the cluster configuration data (or the latest data if you reconfigured the cluster) created by the Cluster WebUI.
To reinstall EXPRESSCLUSTER Server on the entire cluster
To reinstall the EXPRESSCLUSTER Server, follow the procedures below:
Unplug disk all cables connected to all servers because access restriction does not function until reinstallation of the EXPRESSCLUSTER Server is completed.
Uninstall the EXPRESSCLUSTER Server in all servers that configure a cluster system. When reinstalling OS, it is not necessary to uninstall EXPRESSCLUSTER. However, if EXPRESSCLUSTER will be reinstalled to the folder where it was installed before, all files in the installation folder need to be deleted.
For details on the uninstallation procedures, refer to "Uninstalling the EXPRESSCLUSTER Server" in this chapter.
Shut down OS after uninstalling the EXPRESSCLUSTER Server is completed.
Important
When a shared disk is used, make sure not to start the server connected to the shared disk while EXPRESSCLUSTER is uninstalled. Data on the shared disk may be corrupted.
Install the EXPRESSCLUSTER Server and register the license as necessary. Shut down the OS after installing the EXPRESSCLUSTER Server is completed. If the shared disk is used, connect the shared disk and then start the OS. If the shared disk is not used, simply start the OS.
When a shared disk is used, make sure not to connect the shared disk to HBA without filtering settings or SCSI controller. Data on the shared disk may be corrupted.
Create the cluster configuration data and a cluster.
To reinstall EXPRESSCLUSTER Server on some servers in the cluster
To reinstall the EXPRESSCLUSTER Server, follow the procedures below:
When a shared disk is used, unplug all disk cables connected to the servers on which you want to reinstall the EXPRESSCLUSTER Server. This is because the access control does not work until the reinstallation is completed.
Uninstall the EXPRESSCLUSTER Server. If you are reinstalling the OS, it is not necessary to uninstall the EXPRESSCLUSTER. However, when reinstalling in the folder on which EXPRESSCLUSTER was installed, the files in the installation folder must be deleted.
For details on uninstallation procedures, refer to "Uninstalling the EXPRESSCLUSTER Server" in this chapter.
Shut down the OS when uninstalling the EXPRESSCLUSTER Server is completed.
Important
When a shared disk is used, make sure not to start the server connected to the shared disk while EXPRESSCLUSTER is uninstalled. Data on the shared disk may be corrupted.
Install the EXPRESSCLUSTER Server to the server where it was uninstalled, and register the license as necessary. Shut down the OS when installing EXPRESSCLUSTER Server is completed. When a shared disk is used, connect the shared disk and then start the OS. If a shared disk is not used, simply start the OS.
When a shared disk is used, make sure not to connect the shared disk to HBA without filtering settings or SCSI controller. Data on the shared disk may be corrupted.
Connect to the Cluster WebUI in other servers in a cluster and switch to the Configmode.
If a shared disk is used and the OS is reinstalled, or if you modify HBA to connect the shared disk, update the filtering information in HBA tab in Server Properties of the server where the OS is reinstalled.
Important
To configure the filtering settings, click Server Properties of the server where the EXPRESSCLUSTER Server is installed, click HBA tab, and then click Connect. If the filtering setting is configured without clicking Connect, data on the shared disk may be corrupted.
From the server where the web browser of the Cluster WebUI is connected, run clpcl --suspend --force from the command prompt and suspend the cluster.
Apply the changes by the Configmode.
If the fixed-term license is used, run the following command.
clplcnsc --reregister <a folder path for saved license files>
The following message is displayed if the changes has successfully been applied.
The application finished successfully.
Change the Cluster WebUI to Operation mode and resume the cluster from the Service menu.
Note
When resuming the cluster from the Cluster WebUI, the message "Failed to resume the cluster. Click the Reload button, or try again later." is displayed, but ignore this message.
Reboot the OS on the server where the EXPRESSCLUSTER Server is reinstalled.
When Off is selected in Auto Return in Cluster Properties, click the server where the EXPRESSCLUSTER Server is reinstalled by using the Cluster WebUI and select Recover.
When the cluster was shut down and rebooted after distribution of the configuration data created by the Cluster WebUI to all servers, the following message was displayed on the alert log, and the cluster stopped.
"The license is not registered. (Product name: %1)"
%1: Product name
The cluster has been shut down and rebooted without its license being registered.
Register the license according to "Registering the license".
When the cluster was shut down and rebooted after distribution of the configuration data created by the Cluster WebUI to all servers, the following message appeared on the alert log, but the cluster is working properly.
"The number of licenses is insufficient. The number of insufficient licenses is %1. (Product name:%2)"
%1: The number of licenses in short of supply
%2: Product name
Not enough license
Obtain a license and register it.
While the cluster was operated on the trial license, the following message is displayed and the cluster stopped.
"The trial license has expired in %1. (Product name: %2)"
%1: Trial end date
%2: Product name
The license has already expired.
Ask your sales agent for extension of the trial version license, or obtain and register the product version license.
While the cluster was operated on the fixed term license, the cluster operation was disabled with the following message outputted:
"The fixed term license has expired in %1. (Product name:%2)"
%1: Fixed term end day
%2: Product name
"Cluster operation is forcibly disabled since a valid license has not been registered."
The license has already expired.
Obtain the license for the product version from the vendor, and then register the license.
A server that is part of a cluster in a cluster system. In networking terminology, it refers to devices, including computers and routers, that can transmit, receive, or process signals.