6. Network partition resolution resources details

This chapter provides detailed information on network partition resolution resources.

This chapter covers:

6.1. Network partitions

Network partitioning status refers to the status where all communication channels have problems and the network between servers is partitioned.
In a cluster system that is not equipped with solutions for network partitioning, a failure on a communication channel cannot be distinguished from an error on a server. This can cause data corruption brought by access from multiple servers to the same resource. EXPRESSCLUSTER, on the other hand, uses resources for network partition resolution to distinguish a failure on a server from network partitioning when a heartbeat from a server is lost. If the lack of heartbeat is determined to be caused by the server's failing, the system performs a failover by activating each resource and rebooting applications on a server running normally. When the lack of heartbeat is determined to be caused by network partitioning, emergency shutdown is executed because protecting data has higher priority over continuity of the operation.

6.2. Understanding the network partition resolution resources

Servers in a cluster monitor other servers by using heartbeat resources. When all heartbeat resources are disconnected or other server is shut down by a server not in a cluster, the network partition is solved using network partition resolution resources. The following network partition resolution resource is provided.

Network partition
resolution resources

Abbreviation

Function Overview

PING network partition resolution resource (PING method)

pingnp

A network partition is solved by determining a server that can communicate using the ping command.

HTTP network partition resolution resource (HTTP method)

httpnp

A network partition is solved by determining a server that can communicate, sending HTTP HEAD request to Web server.

If there is only one available LAN on the configuration, set the PING network partition resolution resource or the HTTP network partition resolution resource.

6.3. Understanding network partition resolution by PING method

6.3.1. Settings of the PING network partition resolution resources

To use PING network partition resolution resources, a device that is always active to receive and respond to the ping command (hereafter described as ping device) is required.

When the heartbeat from another server is lost but the ping device is responding to the ping command, the remote server is down. Failover starts.

If there is no response to the ping command, it is determined that the local server is isolated from the network due to network partitioning an action when a network partition occurs is performed.

For details, see "NP Resolution tab" in "Cluster properties"in "Parameter details" in this guide.

6.3.2. Note on PING network partition resolution resource

When using PING network partition resolution resource, specify addresses which can be sent from and received to through one of the interconnect LANs registered in the configuration information.

In case that response to ping command continues not returning on all the all servers before disconnection of the heartbeat due to ping device failure or other reasons, network partition cannot be resolved. If the heartbeat disconnection is detected in this situation, an action when a network partition occurs is performed on all servers.

It is possible to set Use or Do Not Use for each server. If Do Not Use is set incorrectly, NP resolution processing cannot be performed and a double activation may be detected.
The following is an example of an incorrect setting in which NP resolution processing cannot be performed.

6.4. Understanding network partition resolution by HTTP method

6.4.1. Settings of the HTTP network partition resolution resources

To use the HTTP network partition resolution resources, the following settings are required.

  • An all time running server with HTTP communication available (hereafter referred to as Web server) is needed.

When the heartbeat from another server is detected to be stopped, the HTTP network partition resolution resource operates in the following two ways: If there is a response from Web server, it determines it as a failure of another server and executes the failover. If there is no response from Web server, it determines that the network partition status isolated the local server from the network and executes the same operation as when the network partition occurs.

For more information, refer to "NP Resolution tab" in "Cluster properties" in "Parameter details" in this guide.

6.4.2. Note on HTTP network partition resolution resource

In the communication with Web server, NIC and a source address are selected according to the OS settings.

6.5. Not resolving network partition

When this method is selected, network partition resolution is not performed. Therefore, if a failure occurs on all the network channels between servers in a cluster, all servers fail over.