1. Preface¶
1.1. Who Should Use This Guide¶
The EXPRESSCLUSTER X Maintenance Guide describes maintenance-related information, intended for administrators. See this guide for information required for operating the cluster.
1.2. How This Guide is Organized¶
2. The system maintenance information: Provides maintenance information for EXPRESSCLUSTER.
1.3. EXPRESSCLUSTER X Documentation Set¶
The EXPRESSCLUSTER X manuals consist of the following six guides. The title and purpose of each guide is described below:
This guide is intended for all users. The guide covers topics such as product overview, system requirements, and known problems.
Installation and Configuration Guide
This guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.
This guide is intended for system administrators. The guide covers topics such as how to operate EXPRESSCLUSTER, function of each module and troubleshooting. The guide is supplement to the "Installation and Configuration Guide".
Maintenance Guide
This guide is intended for administrators and for system administrators who want to build, operate, and maintain EXPRESSCLUSTER-based cluster systems. The guide describes maintenance-related topics for EXPRESSCLUSTER.
This guide is intended for administrators and for system engineers who want to build EXPRESSCLUSTER-based cluster systems. The guide describes features to work with specific hardware, serving as a supplement to the "Installation and Configuration Guide".
This guide is intended for administrators and for system engineers who want to build EXPRESSCLUSTER-based cluster systems. The guide describes EXPRESSCLUSTER X 4.0 WebManager, Builder, and EXPRESSCLUSTER Ver 8.0 compatible commands.
1.4. Conventions¶
In this guide, Note, Important, See also are used as follows:
Note
Used when the information given is important, but not related to the data loss and damage to the system and machine.
Important
Used when the information given is necessary to avoid the data loss and damage to the system and machine.
See also
Used to describe the location of the information given at the reference destination.
The following conventions are used in this guide.
Convention |
Usage |
Example |
---|---|---|
Bold |
Indicates graphical objects, such as fields, list boxes, menu selections, buttons, labels, icons, etc. |
In User Name, type your name.
On the File menu, click Open Database.
|
Angled bracket within the command line |
Indicates that the value specified inside of the angled bracket can be omitted. |
|
Monospace (courier) |
Indicates path names, commands, system output (message, prompt, etc), directory, file names, functions and parameters. |
|
Monospace bold (courier) |
Indicates the value that a user actually enters from a command line. |
Enter the following:
clpcl -s -a
|
Monospace italic
(courier)
|
Indicates that users should replace italicized part with values that they are actually working with. |
|
1.5. Contacting NEC¶
For the latest product information, visit our website below:
2. The system maintenance information¶
This chapter provides information you need for maintenance of your EXPRESSCLUSTER system. Resources to be managed are described in detail.
This chapter covers:
2.4. System resource statistics information collection function
2.8. Procedure for suspending or releasing the limit on the band for mirror connect communication
2.9. Configuring the settings to temporarily prevent execution of failover
2.28. Replacing the disk array controller (DAC)/updating the firmware
2.1. Directory structure of EXPRESSCLUSTER¶
Note
You will find executable files and script files that are not described in "EXPRESSCLUSTER command reference" in the "Reference Guide" under the installation directory. Run these files only with EXPRESSCLUSTER. Any failures or troubles caused by executing them by using applications other than EXPRESSCLUSTER are not supported.
EXPRESSCLUSTER directories are structured as described below:
- Directory for alert synchronizationThis directory stores EXPRESSCLUSTER Alert Synchronization's modules and management files.
- Directory for cluster modulesThis directory stores the EXPRESSCLUSTER Server's executable files and libraries.
- Directory for cloud environmentThis directory stores script files for cloud environment.
- Directory for cluster configuration dataThis directory stores the cluster configuration files and policy file of each module.
- Directory for event logsThis directory stores libraries that are related to the EXPRESSCLUSTER event logs.
- Directory for HA productsThe directory stores binary files and setting files of Java Resource Agent and System Resource Agent.
- Directory related to HelpNot used now.
- Directory for licensesThis directory stores licenses for licensed products.
- Directory for module logsThis directory stores logs produced by each module.
- Directory for report messages (alert, event log)This directory stores alert and event log messages reported by each module.
- Directory for performance logThis directory stores performance log of mirror or hybrid disk resources and system resource of OS.
- Directory for the registryNot used now.
- Directory for script resource scripts of group resourcesThis directory stores script resource scripts of group resources.
- Directory for the recovery script executedThis directory stores the script executed when an error is detected in the group resource or monitor resource.
- Directory for the string tableThis directory stores string tables used in EXPRESSCLUSTER.
- Directory for the WebManager server and Cluster WebUI.This directory stores the WebManager server modules and management files.
- Directory for module tasksThis is a work directory for modules.
- Directory for cluster driversThis directory stores drivers for kernel mode LAN heartbeat and disk filter.
2.2. How to delete EXPRESSCLUSTER logs or alerts¶
To delete EXPRESSCLUSTER logs or alerts, perform the following procedure.
Disable all cluster services on all servers in a cluster.
clpsvcctrl.bat --disable -a
Shut down the cluster with the Cluster WebUI or clpstdn command, and then reboot the cluster.
To delete logs, delete the files in the following folder. Perform this operation on the server for which you want to delete the logs.
<EXPRESSCLUSTER installation path>\log
To delete alerts, delete the files in the following folder. Perform this operation on the server for which you want to delete the alerts.
<EXPRESSCLUSTER installation path>\log
Enable all cluster services on all servers in a cluster.
clpsvcctrl.bat --enable -a
Restart all the servers in the cluster.
2.3. Mirror statistics information collection function¶
2.3.1. What is the mirror statistics information collection function?¶
The mirror statistics information collection function collects statistics information related to the mirror function that is obtained from each mirror source in mirror disk and hybrid disk configurations.
Using the Windows OS functions (performance monitor and typeperf command), the mirror statistics information collection function can collect mirror statistics information for EXPRESSCLUSTER X and display the collected information in real time. Moreover, it can continuously output mirror statistics information to a statistic log file from the instant that the mirror is constructed.
As shown below, the collected mirror statistics information can be used during mirror construction and mirror operation.
During mirror construction |
To tune the mirror setting items in the current environment, you can adjust the optimum setting by checking how each setting item influences the current environment. |
---|---|
During mirror operation |
You can monitor the situation to determine whether a problem is likely to occur.
Moreover, analysis performance improves because mirror statistics information can be collected before and after failure occurrence.
|
2.3.2. Linkage between the mirror statistics information collection function and OS standard functions¶
Using the OS standard functions
Using the performance monitor and typeperf command, mirror statistics information can be collected and that information displayed in real time. Any counter can be selected from the subsequent "Counter names" list to continuously display and collect information over a fixed period of time. This allows you to visually check whether the mirror-related setting values are suitable for the constructed environment or whether an error has occurred during the collection of the statistics information.
For the procedure for using the performance monitor and typeperf command, see the subsequent items "Displaying mirror statistics information with the performance monitor," "Collecting mirror statistics information from the performance monitor", and "Collecting mirror statistics information from the typeperf command."
Specifying an object name
The object name used with the mirror statistics information collection function is "Cluster Disk Resource Performance" Specifying the "Cluster Disk Resource Performance" object enables the collection of mirror statistics information.
Specifying a counter name
The counter names used by the mirror statistics information collection function are listed below.
Counter name
Meaning
Unit
Description
% Compress Ratio
Compression ratio
%
Compression ratio of the mirror data to be sent to a remote server. The ratio of the compressed data size relative to the original data is used. Therefore, if 100 MB of data is compressed to 80 MB, the compression ratio is 80%.
Async Application Queue BytesAsync Application Queue Bytes, MaxApplication queue size (instantaneous value/maximum value)
Byte
Amount of data which is retained in the user space memory and which has yet to be sent during asynchronous mirror communication. The value that appears when the latest data is collected is an instantaneous value while the value that appears when the amount of data to be retained is the greatest is the maximum value.
Async Kernel Queue BytesAsync Kernel QueueBytes, MaxKernel queue size (instantaneous value/maximum value)
Byte
Amount of data which is retained in the kernel space memory and which has yet to be sent during asynchronous mirror communication. The value that appears when the latest data is collected is an instantaneous value while the value that appears when the amount of data to be retained is the greatest is the maximum value.
Async Mirror Queue Transfer TimeAsync Mirror Queue Transfer Time, MaxTime for transfer from the kernel queue to the application queue (average value/maximum value)
msec
Average value/maximum value of the time needed to transfer data from the kernel space memory to the user space memory during asynchronous mirror communication
Async Mirror Send Wait History Files Total BytesAsync Mirror Send Wait History Files Total Bytes, MaxHistory file usage (instantaneous value/maximum value)
Byte
Total size of the data files accumulated in the history file storage folder and which have yet to be sent during asynchronous mirror communication. The value that appears when the latest data is collected is an instantaneous value while the value that appears when the amount of accumulated data is the greatest is the maximum value.
Async Mirror Send Wait Total BytesAsync Mirror Send Wait Total Bytes, MaxAmount of data yet to be sent (instantaneous value/maximum value)
Byte
Total amount of mirror data which is to be sent to a remote server and which has yet to be sent during asynchronous mirror communication. The value that appears when the latest data is collected is an instantaneous value while the value that appears when the amount of data that has yet be sent is the greatest is the maximum value.
Mirror Bytes SentMirror Bytes Sent/secMirror transmission amount (total value/average value)
Byte(Byte/sec)Number of bytes of mirror data sent to a remote server. The total number of bytes that appears until the latest data is collected is the total value while the number of bytes to be sent per second is the average value.
Request Queue BytesRequest Queue Bytes, MaxRequest queue size (instantaneous value/maximum value)
Byte
Amount of queue used when an IO request is received during mirror communication. The value that appears when the latest data is collected is an instantaneous value while that the value that appears when the queue size is the greatest is the maximum value.
Transfer Time, AvgTransfer Time, MaxMirror communication time (average value/maximum value)
msec/time
Communication time per mirror communication used during mirror data transmission. The communication time averaged by the number of times of mirror communication used until the latest data is collected is the average value while the communication time per mirror communication which was the greatest is the maximum value.
- Specifying the instance nameThe instance name to be used by the mirror statistics information collection function is "MD,HD ResourceX." X indicates a mirror disk number/hybrid disk number from 1 to 22.For example, if the mirror disk number of mirror disk resource "MD" is set to "2", the mirror statistics information relating to resource "MD" can be collected by specifying instance "MD,HD Resource2."Moreover, if two or more resources are set, specifying instance "_Total" can collect information totalized by mirror statistics information relating to all resources that have been set.
Note
Specify the instance name corresponding to the mirror disk number/hybrid disk number for which a resource is set. An instance for which no resource is set can be specified; however, mirror statistics information cannot be displayed/collected.
Using mirror statistics information
Mirror statistics information that has actually been collected can be used to adjust the mirror-related setting values. If, for example, the communication speed and communication load can be confirmed from the collected mirror statistics information, it may be possible to improve the communication speed by turning the mirror-related setting values.
Displaying mirror statistics information with the performance monitor
Procedure for displaying the mirror statistics information to be collected in real time
From the Start menu, start Administrative Tools - Performance Monitor.
Select the performance monitor.
Click the + button or right-click to execute Add Counters from the menu.
Save the counter setting added with File - Save as.
Starting from the saved setting, you can repeatedly use the same counter setting.
The procedure is detailed below.Here, "Mirror Bytes Sent," or one item of mirror statistics information, is collected as an example. The target instance is assumed to be "MD/HD Resource1."From the Start menu, start Administrative Tools - Performance.
-
The performance monitor window appears on the right-hand side of the window.
-
If the operation conditions are satisfied, the additional counter/instance is displayed.Select Cluster Disk Resource Performance, select counter Mirror Bytes Sent and instance MD,HD Resource1 and then click Add.
Note
If Cluster Disk Resource Performance is not displayed, the linkage function is disabled. In this case, execute the following command at the command prompt to enable the linkage function, and then retry the procedure from step 1.
>lodctr.exe <EXPRESSCLUSTER installation path>\perf\clpdiskperf.ini
Save the counter setting added with File - Save as.
Starting from the saved setting, you can repeatedly use the same counter setting.
Collecting mirror statistics information from the performance monitor
The following explains the procedure for collecting the log file of mirror statistics information from the performance monitor.
Procedure for collecting the log file
From the Start menu, start Administrative Tools - Performance Monitor.
Create a new data collector set with Data Collector Sets - User Defined.
From Create Data Log, select Performance Counter and then click Add.
Select Cluster Disk Resource Performance and then add the counter and instance to be collected.
Start log collection.
The procedure is detailed below.Here, "Mirror Bytes Sent," or one item of mirror statistics information, is collected as an example. The target instance is assumed to be "MD/HD Resource1."From the Start menu, start Administrative Tools - Performance Monitor.
From Data Collector Sets - User Defined, select Operation - New, or from New of the right-click option, specify Data Collector Set.
Enter any name as the data collector set name.
As the data collector set creation method, select Create manually (Details) (C).
From Create Data Log, select Performance Counter and then click Add.
- Add a counter. Here, after selecting Mirror Bytes Sent from Cluster Disk Resource Performance, select MD,HD Resource1 from Instances of Selected object, and then click Add.MD,HD Resource1 of Mirror Bytes Sent is added to Added Counter.After adding all the counters to be collected, click OK and then select Finish.
Note
If Cluster Disk Resource Performance is not displayed, the linkage function is disabled. In this case, execute the following command at the command prompt to enable the linkage function, and then retry the procedure from step 1.
>lodctr.exe <EXPRESSCLUSTER installation path>\perf\clpdiskperf.ini
Start log collection. Execute Start from the menu with Data Collector Sets - User Defined - (Data Collector Set Name).
Collecting mirror statistics information from the typeperf command
The following explains the procedure for collecting the mirror statistics information from the typeperf command.
From the Start menu, start Programs - Accessories - Command Prompt.
Execute typeperf.exe.
The following explains the use example in detail.
[Use example 1] Collecting the mirror communication time (specifying all instances EXPRESSCLUSTER Resource)
Case in which MD resources: md01 to md04 and HD resources: hd05 to hd08 are already registeredHowever, each resource is set as follows:The md01 mirror disk number is 1. The md02 mirror disk number is 2. : The hd07 hybrid disk number is 7. The hd08 hybrid disk number is 8.
Note
The title is line-fed to enhance readability. Actually, the title is displayed in a horizontally row.
[Use example 2] Collecting the amount of mirror data transmission (specifying the hd05 resource for the instance)
[Use example 3] Outputting the compression ratio to the log (specifying the hd01 resource for the instance)
Case in which MD resources: md01 to md04 and HD resources: hd05 to hd08 are already registeredHowever, each resource is set as follows:The md01 mirror disk number is 1. The md02 mirror disk number is 2. : The hd07 hybrid disk number is 7. The hd08 hybrid disk number is 8.
CSV is specified as the log file format andC:\PerfData\hd01.csv
as the file output destination path.Use [Ctrl]+[C] to stop the log output after command execution.[Use example 4] Displaying the counter list (specifying no instance)
Case in which MD resources: md01 to md04 and HD resources: hd05 to hd08 are already registeredHowever, each resource is set as follows:The md01 mirror disk number is 1. The md02 mirror disk number is 2. : The hd07 hybrid disk number is 7. The hd08 hybrid disk number is 8.
In addition, sampling interval change, command issuance to a remote server, and the like can all be specified as options.Use "Typeperf -?" to confirm the details of the options.
2.3.3. Operation of the mirror statistics information collection function¶
Mirror statistics information log output (automatic) during operation
The mirror statistics information collection function continuously collects statistics information in the environment in which the operation condition is satisfied and then outputs it to the statistic log file. Mirror statistics information collection and log output are performed automatically. Statistic log output is detailed below.
Item
Operation
Description
Output file name
nmp_<n>.curnmp_<n>.pre<x>nmp_total.curnmp_total.pre<x><n> indicates the mirror disk No. or hybrid disk No.cur is the newest, followed by pre, pre1, pre2, ..., in the newest to oldest order. The larger the number, the older.When the prescribed number of log files is exceeded, existing logs are deleted, starting with the oldest.total indicates the total data of all mirror disk resources/hybrid disk resources.Output file format
Text file
Data is output to the file in the comma-separated (CSV) text format.One-line data is output for each information collection.Output destination folder
EXPRESSCLUSTER installation folder\perf\disk
Data is output within the work folder immediately under the EXPRESSCLUSTER installation folder.
Resource to be output
For each resource+ totalLog is output to one file for each mirror disk resource or hybrid disk resource that was set.If no resource is set, no log file is created.If one or more log files are created, the Total log file indicating the total value of all the resources is also created.Output timing
Per minute
Information is output every minute.No log output occurs if the mirror statistics information output function is disabled.If the mirror statistics information log output operation is disabled, no log output occurs even though the mirror statistics information collection function is operating.Output file size
About 16 MB
The maximum size of one file is about 16 MB.If the upper size limit is exceeded, the log file is automatically rotated and the previous log file is saved.Even if the upper size limit is not exceeded, the log file may be rotated automatically when the output data is changed.Number of log rotations
12 generations
Up to 12 generations of log files are saved through log file rotations.If the upper rotation limit is exceeded, the oldest generation log file is automatically deleted.
2.3.4. Operation conditions of the mirror statistics information collection function¶
The mirror statistics information collection function runs when the following conditions are satisfied:
The EXPRESSCLUSTER Disk Agent service is active normally.
One or more mirror disk resources or hybrid disk resources are set.
The mirror statistic information collection function is enabled in cluster properties.
Confirm the EXPRESSCLUSTER Disk Agent service status.
From the Start menu, start Server Management - Service.Confirm that the EXPRESSCLUSTER Disk Agent service status is Start.Confirm that Startup Type is Auto.The server is required to be restarted if the service status is not Start.
Confirm the mirror setting.
Start Cluster WebUI.Confirm that the mirror disk resource or hybrid disk resource is set.
Confirm the setting of the mirror statistics information collection function.
Start Cluster WebUI.Change the mode to Settings. (Specification of the prescribed setting tab/cluster properties?)
For details of Cluster WebUI, see the online manual of Cluster WebUI.
2.3.5. Notes on the mirror statistics information collection function¶
To operate the mirror statistics information collection function, the free space (up to about 8.9 GB) is required on disk to record the statistic log file of the mirror statistics information.
Up to 32 processes can be started for a single server with both the performance monitor and typeperf commands combined. No mirror statistics information can be collected if more than 32 performance monitors or typeperf commands are executed for a single server.
- More than one of statistical information acquisition can't be done in 1 process.For example the computer which is a target from more than one performance monitor on the other computers, and the occasion from which more than one data collect is extracted by 1 performance monitor, etc.
- The extracted mirror statistics information is included in the logs collected by the clplogcc command or Cluster WebUI.Specify type5 to collect the log by the clplogcc command; specify Pattern 5 to collect the log by the Cluster WebUI. For details about log collection, see "Collecting logs (clplogcc command)" in "EXPRESSCLUSTER command reference" in the "Reference Guide" or the online manual.
2.4. System resource statistics information collection function¶
If the Collect the System Resource Information check box is checked on the Monitor tab of Cluster Properties in the Cluster WebUI config mode, or if system monitor resources or process resource monitor resources are added to the cluster, information on the system resource is collected and saved under install_path/perf/system according to the following file naming rules. The file format is CSV (text). In the following explanations, this file is referred to as the system resource statistics information file.
system.cur
system.pre
|
|
---|---|
cur |
Indicates the latest information output destination. |
pre |
Indicates the previous, rotated, information output destination. |
The collected information is saved to the system resource statistics information file. The output interval (sampling interval) of statistics information is 60 seconds. If the size of current log file reached 16MB, log rotation occurs and the information is saved to a new log file (two generation log files can be used). Information saved to the system resource statistics information file can be used as a reference for analyzing the system performance. The collected statistics information contains the following items.
Statistic value name |
Unit |
Description |
---|---|---|
CPUCount |
Quantity |
The number of CPUs |
CPUUtilization |
% |
Utilization of CPU |
MemoryTotalSize |
KByte |
Total memory size |
MemoryCurrentSize |
KByte |
Utilization of memory |
SwapTotalSize |
KByte |
Total swap size |
SwapCurrentSize |
KByte |
Utilization of swap |
ThreadCurrentSize |
Quantity |
The number of threads |
FileCurrentSize |
Quantity |
The number of opened files |
ProcessCurrentCount |
Quantity |
The number of processes |
AvgDiskReadQueueLength__Total |
Quantity |
The number of read requests queued in disk |
AvgDiskWriteQueueLength__Total |
Quantity |
The number of write requests queued in disk |
DiskReadBytesPersec__Total |
Byte |
The number of bytes transferred from disk by read operation |
DiskWriteBytesPersec__Total |
Byte |
The number of bytes transferred to disk by write operation |
PercentDiskReadTime__Total |
tick |
Busy time occurred while disk handles read requests |
PercentDiskWriteTime__Total |
tick |
Busy time occurred while disk handles write requests |
PercentIdleTime__Total |
tick |
Disk idle time |
CurrentDiskQueueLength__Total |
Quantity |
The number of requests remained in disk when performance data are collected |
The following output is an example of system resource statistics information file.
system.cur
"Date","CPUCount","CPUUtilization","MemoryTotalSize","MemoryCurrentSize","SwapTotalSize","SwapCurrentSize","ThreadCurrentSize","FileCurrentSize","ProcessCurrentCount","AvgDiskReadQueueLength__Total","AvgDiskWriteQueueLength__Total","DiskReadBytesPersec__Total","DiskWriteBytesPersec__Total","PercentDiskReadTime__Total","PercentDiskWriteTime__Total","PercentIdleTime__Total","CurrentDiskQueueLength__Total"
"2019/11/14 17:18:57.751","2","11","2096744","1241876","393216","0","1042","32672","79","623078737","241067820","95590912","5116928","623078737","241067820","305886514","0"
"2019/11/14 17:19:57.689","2","3","2096744","1234892","393216","0","926","31767","77","14688814","138463292","3898368","7112192","14688814","138463292","530778498","0"
"2019/11/14 17:20:57.782","2","2","2096744","1194400","393216","26012","890","30947","74","8535798","189735393","3802624","34398208","8535798","189735393","523400261","0"
:
2.5. Cluster statistics information collection function¶
If the Cluster Statistical check box is already checked on the Extension tab of Cluster Properties in the Cluster WebUI config mode, it collects the information of the results and the time spent for each of the processings such as group failover, group resource activation and monitor resource monitoring. This file is in CSV format. In the following explanations, this file is represented as the cluster statistics information file.
For groups
group.curgroup.precur
Indicates the latest information output destination.
pre
Indicates the previous, rotated, information output destination.
File location
install_path/perf/cluster/group/
For group resources
The information for each type of group resource is output to the same file.
[Group resource type].cur[Group resource type].precur
Indicates the latest information output destination.
pre
Indicates the previous, rotated, information output destination.
File location
install_path/perf/cluster/group/
For monitor resources
The information for each type of monitor resources is output to the same file.
[Monitor resource type].cur[Monitor resource type].precur
Indicates the latest information output destination.
pre
Indicates the previous, rotated, information output destination.
File location
install_path/perf/cluster/monitor/
Note
Listed below are the timing to output the statistics information to the cluster statistics information file:
For groups 1
When the group startup processing is completed
When the group stop processing is completed
When the group move processing is completed 2
When the failover processing is completed 2
For group resources
When the group resource startup processing is completed
When the group resource stop processing is completed
For monitor resources
When the monitor processing is completed
When the monitor status change processing is completed
The statistics information to be collected includes the following items:
Statistic value name |
Description |
---|---|
Date |
Time when the statistics information is output.
This is output in the form below (000 indicates millisecond):
YYYY/MM/DD HH:MM:SS.000 YYYY/MM/DD HH:MM:SS.000
|
Name |
Name of group, group resource or monitor resource. |
Action |
Name of the executed processing.
The following strings are output:
For groups: Start (at start), Stop (at stop), Move (at move/failover)
For group resources: Start (at activation), Stop (at deactivation)
For monitor resources: Monitor (at monitor execution)
|
Result |
Name of the results of the executed processing.
The following strings are output:
When the processing was successful: Success (no errors detected in monitoring or activation/deactivation)
When the processing failed: Failure (errors detected in monitoring or activation/deactivation)
When a warning occurred: Warning (only for monitoring, in case of warning)
When a timeout occurred: Timeout (monitoring timeout)
When the processing was cancelled: Cancel (cancelling processings such as cluster shutdown during group startup)
|
ReturnCode |
Return value of the executed processing. |
StartTime |
Start time of the executed processing.
This is output in the form below (000 indicates millisecond):
YYYY/MM/DD HH:MM:SS.000 YYYY/MM/DD HH:MM:SS.000
|
EndTime |
End time of the executed processing.
This is output in the form below (000 indicates millisecond):
YYYY/MM/DD HH:MM:SS.000 YYYY/MM/DD HH:MM:SS.000
|
ElapsedTime(ms) |
Time taken for executing the processing (in millisecond).
This is output in millisecond.
|
Here is an example of the statistics information file to be output when a group with the following configuration is started up:
Group
Group name: failoverA
Group resource which belongs to the group (failoverA)
- script resourceResource name: script01, script02, script03
group.cur
"Date","Name","Action","Result","ReturnCode","StartTime","EndTime","ElapsedTime(ms)" "2018/12/19 09:44:16.925","failoverA","Start","Success",,"2018/12/19 09:44:09.785","2018/12/19 09:44:16.925","7140" :
script.cur
"Date","Name","Action","Result","ReturnCode","StartTime","EndTime","ElapsedTime(ms)" "2018/12/19 09:44:14.845","script01","Start","Success",,"2018/12/19 09:44:09.807","2018/12/19 09:44:14.845","5040" "2018/12/19 09:44:15.877","script02","Start","Success",,"2018/12/19 09:44:14.847","2018/12/19 09:44:15.877","1030" "2018/12/19 09:44:16.920","script03","Start","Success",,"2018/12/19 09:44:15.880","2018/12/19 09:44:16.920","1040" :
2.5.1. Notes on the size of the cluster statistics information file¶
The size of the cluster statistics information file can be set between 1 and 99 MB. The number of cluster statistics information files to be generated differs depending on their configurations.Some configurations may cause a large number of files to be generated. Therefore, consider setting the size of the cluster statistics information file according to the configuration. The maximum size of the cluster statistics information file is calculated with the following formula:
The size of the cluster statistics information file =([Group file size]) x (number of generations (2)) +([Group resource file size] x [number of types of group resources which are set]) x (number of generations (2)) +([Monitor resource file size] x [number of types of monitor resources which are set]) x (number of generations (2))Example: For the following configuration, the total maximum size of the cluster statistics information files to be saved is 232 MB with this calculation. (((1MB) x 2) + ((3MB x 5) x 2) + ((10MB x 10) x 2) = 232MB)
Group (file size: 1 MB)
Number of group resource types: 5 (file size: 3 MB)
Number of monitor resource types: 10 (file size: 10 MB)
2.6. Communication ports¶
EXPRESSCLUSTER uses the following port numbers by default. You can change these port numbers by using the Cluster WebUI. Make sure that the programs other than EXPRESSCLUSTER do not access these port numbers.
To set up a firewall for the server, make sure that the following port numbers can be accessed.
For an AWS environment, configure to able to access the following port numbers in the security group setting in addition to the firewall setting.
For port numbers EXPRESSCLUSTER uses, refer to "Getting Started Guide" > "Notes and Restrictions" > "Before installing EXPRESSCLUSTER" > "Communication port number".
2.7. Limit on the band for mirror connect communication¶
You can set a limit on the communication band used for mirror connect communication by using the standard Windows Local Group Policy Editor (Policy-based QoS). A limit is set for each mirror disk connect. This method is useful for setting a limit on the communication band for all mirror disk resources or hybrid disk resources using the specified mirror disk connect.
2.7.1. Procedure for setting a limit on the band for mirror connect communication¶
To set a limit on the band for mirror connect communication, follow the procedure described below.
Setting the properties of a network adapter
Click Start, Control Panel, then Network and Sharing Center. Then, open Properties for a mirror disk connect.
Check the Qos Packet Scheduler check box when it is in Properties.
Click Install, Services, and then Add buttons to select QoS Packet Scheduler when it is not in Properties.
- Starting the Local Group Policy EditorTo set a limit on the band, use the Local Group Policy Editor. From the Start menu, click Run, and then execute the following command:
gpedit.msc
Creating a policy
Create a policy for a limit on the band. In the left pane, click Local Computer Policy, Computer Configuration, then Windows Settings, and then right-click Policy-based QoS and select Create New Policy.
Policy-based QoS - Create a QoS policy window
Set items as follows.
Policy name
Enter a policy name for identification.
Specify DSCP value
Set the IP priority. This setting is optional. For details, see Learn more about QoS Policies.
Specify Outbound Throttle Rate
Check the Specify Outbound Throttle Rate check box. Specify an upper limit on the communication band used for the mirror disk connect in units of KBps (kilobytes per second) or MBps (megabytes per second).
After setting the required items, click the Next button.
Policy-based QoS - This QoS policy applies to: window
Set this item as follows.
This QoS policy applies to: (application specification)
Select All applications.
After setting the required items, click the Next button.
Policy-based QoS - Specify the source and destination IP addresses. window
Set these items as follows.
This QoS policy applies to: (source IP address specification)
Select Only for the following source IP address or prefix and then enter the source IP address used for the mirror disk connect.
This QoS policy applies to: (destination IP address specification)
Select Only for the following destination IP address or prefix and then enter the destination IP address used for the mirror disk connect.
After setting the required items, click the Next button.
Policy-based QoS - Specify the protocol and port numbers. window
Set these items as follows.
Select the protocol this QoS policy applies to (S)
Select TCP.
Specify the source port number:
Select From any source port.
Specify the destination port number:
Select To this destination port number or range and then specify the mirror driver port number (default: 29005).
- Reflecting the policyClick the Finish button to apply the settings. The set policy is not immediately reflected, but according to the automatic policy update interval (default: within 90 minutes). To reflect the set policy immediately, update the policy manually. From the Start menu, click Run, and then execute the following command:
gpupdate /force
This completes the setting of a policy.
2.8. Procedure for suspending or releasing the limit on the band for mirror connect communication¶
To suspend or release the limit on the band for mirror connect communication, follow the procedure described below.
- Starting the Local Group Policy EditorTo suspend or release the limit on a band, use the Local Group Policy Editor. From the Start menu, click Run, and then execute the following command:
gpedit.msc
Suspending a policy by changing its setting or deleting the policy
- To suspend a limit on the bandTo suspend a limit on the band, change the setting for the policy for the limit on the band. Right-click the target QoS policy and then choose Edit Existing Policy. Then, uncheck the Specify Outbound Throttle Rate check box.After making this setting, click the OK button.
- To release a limit on the bandTo release a limit on the band, delete the policy for the limit on the band. Right-click the target QoS policy and then choose Delete Policy. The pop-up message "Are you sure you want to delete the policy?" appears. Click Yes.
- Reflecting the policyThe modification or deletion of a policy is not immediately reflected, but according to the automatic policy update interval (default: within 90 minutes). To reflect the deletion or modification immediately, update the policy manually. From the Start menu, click Run, and then execute the following command:
gpupdate /force
This completes the setting of a policy.
2.8.1. What causes EXPRESSCLUSTER to shut down servers¶
When any one of the following errors occurs, EXPRESSCLUSTER shuts down or resets servers to protect resources.
2.8.2. Final action for an error in group resource activation or deactivation¶
When one of the following is specified as the final action to be taken for errors in resource activation/deactivation:
Final action |
Result |
---|---|
The cluster service stops and the OS shuts down. |
Causes normal shutdown after the group resources stop. |
The cluster service stops and the OS reboots. |
Causes normal reboot after the group resources stop. |
An intentional stop error is generated |
Causes a stop error (Panic) intentionally upon group resource activation/deactivation error. |
2.8.3. Action for a stall of resource activation or deactivation¶
When one of the following is specified as the action to be taken for a stall of resource activation or deactivation, and resource activation or deactivation took longer time than expected:
Action for a stall |
Result |
---|---|
Emergency shutdown |
Causes the OS to shut down upon the stall of group resource activation or deactivation. |
Intended generation of a stop error |
Causes a stop error (Panic) upon the stall of group resource activation or deactivation. |
The OS shuts down if the resource activation or deactivation takes an unexpectedly long time. The OS shuts down, regardless of the setting of recovery in the event of a resource activation or deactivation error.
If a resource activation stall occurs, the following message is output to the event log and as an alert message.
Module type: rc
Event ID: 1032
Message: Failed to start the resource %1. (99 : command is timeout)
Description: Resource start failure
If a resource deactivation stall occurs, the following message is output to the event log and as an alert message.
Module type: rc
Event ID: 1042
Message: Failed to stop the resource %1. (99 : command is timeout)
Description: Resource stop failure
2.8.4. Final action at detection of an error in monitor resource¶
When the final action for errors in monitor resource monitoring is specified as one of the following:
Final action |
Result |
---|---|
Stop cluster service and shut down the OS |
Causes normal shutdown after the group resources stop. |
Stop cluster service and reboot the OS |
Causes normal reboot after the group resources stop. |
An intentional stop error is generated |
Causes a stop error (Panic) intentionally upon monitor resource error detection. |
2.8.5. Forced stop action¶
When the setting is configured as Use Forced Stop:
Physical machine
Forced stop action
Result
BMC reset
Causes reset in the failing server where the failover group existed.
BMC power off
Causes power off in the failing server where the failover group existed.
BMC power cycle
Causes power cycle in the failing server where the failover group existed.
BMC NMI
Causes NMI in the failing server where the failover group existed.
vSphere virtual machine (guest OS)
Forced stop action
Result
VMware vSphere CLI Power off
Causes power off in the failing server where the failover group existed.
2.8.6. Emergency server shutdown¶
When the following processes terminated abnormally, clustering can not work properly. Then EXPRESSCLUSTER shuts down the server on which those processes terminated. This action is called emergency server shutdown.
clpnm.exe
clprc.exe
Server shut down method can be configured in Action When the Cluster Service Process is Abnormal of Cluster Properties from the config mode of Cluster WebUI. Followin method can be set.
2.8.7. Resource deactivation error in stopping the EXPRESSCLUSTER Server service¶
When deactivating a resource by running clpcl -t, which stops the EXPRESSCLUSTER Server service fails, EXPRESSCLUSTER causes a shutdown.
2.8.8. Recovery from network partitioning¶
If all heartbeats are disrupted, network partitioning resolution takes place which results in one or all of the servers to shut down. Unless the automatic recovery mode is set in Cluster Properties, the server is in the Suspension (Isolated) status and is not clustered after reboot.
When you resolve the problem that caused the disruption of heartbeats, recover the cluster.
For details on network partitioning, see "Details on network partition resolution resources" in the "Reference Guide".
For information on the suspended status (restart following a shutdown) and cluster recovery, see the online manual "Functions of the WebManager" in this guide.
2.8.9. Emergency server restart¶
When an abnormal termination is detected in the following processes, EXPRESSCLUSTER reboots the OS. This action is called Emergency server restart.
EXPRESSCLUSTERDisk Agent (clpdiskagent.exe)
EXPRESSCLUSTERServer (clppmsvc.exe)
EXPRESSCLUSTERTransaction (clptrnsv.exe)
2.8.10. Failure in suspending or resuming the cluster¶
If suspending or resuming the cluster fails, the server is shutdown.
2.9. Configuring the settings to temporarily prevent execution of failover¶
Follow the steps below to temporarily prevent failover caused by a failed server from occurring.
Temporarily adjust timeoutBy temporarily adjusting timeout, you can prevent a failover caused by a failed server from occurring.The clptoratio command is used to temporarily adjust timeout. Run the clptoratio command on one of the servers in the cluster.(Example) To (temporarily) extend the heartbeat time-out to 3600 seconds (one hour) from the current time when the heartbeat time-out is set to 90 seconds:
clptoratio -r 40 -t 1h
Releasing temporary time-out adjustmentReleases the temporary adjustment of time-out. Execute the clptoratio command for any server in the cluster.
clptoratio -i
Follow the steps below to temporarily prevent failover caused by a monitor error by temporarily stopping monitor resource monitoring.
Suspending monitoring operation of monitor resourcesBy suspending monitoring operations, a failover caused by monitoring can be prevented.The clpmonctrl command is used to suspend monitoring. Run the clpmonctrl command on all servers in the cluster. Another way is to use the -h option on a server in the cluster and run the clpmonctrl command for all the servers.(Example) To suspend all monitoring operations on the server in which the command is run:
clpmonctrl -s
(Example) To suspend all monitoring operations on the server with -h option specified
clpmonctrl -s -h <server name>
Restarting monitoring operation of monitor resourcesResumes monitoring. Execute the clpmonctrl command for all servers in the cluster.Another way is to use the -h option on a server in the cluster and run the clpmonctrl command for all the servers.(Example) Resuming all monitoring operations on the server in which the command is run:
clpmonctrl -r
(Example) To resume all monitoring operations on the server with -h option specified
clpmonctrl -r -h <server name>
Follow the steps below to temporarily prevent failover caused by a monitor error by disabling recovery action for a monitor resource error.
Disabling recovery action for a monitor resource errorWhen you disable recovery action for a monitor resource error, recovery action is not performed even if a monitor resource detects an error. To set this feature, check the Recovery action when a monitor resource error is detected checkbox in Disable cluster operation under the Extension tab of Cluster properties in the config mode of Cluster WebUI and update the setting. Not disabling recovery action for a monitor resource errorEnable recovery action for a monitor resource error. Uncheck the Recovery action when a monitor resource error is detected checkbox in Disable cluster operation under the Extension tab of Cluster properties in the config mode of Cluster WebUI and update the setting.
Follow the steps below to temporarily prevent failover caused by an activation error by disabling recovery action for a group resource activation error.
Disabling recovery action for a group resource activation errorWhen you disable recovery action for a group resource activation error, recovery action is not performed even if a group resource detects an activation error. To set this feature, check the Recovery operation when a group resource activation error is detected checkbox in Disable cluster operation under the Extension tab of Cluster properties in config mode of Cluster WebUI and update the setting. Not disabling recovery action for a group resource activation errorEnable recovery action for a group resource activation error. Uncheck the Recovery operation when a group resource activation error is detected checkbox in Disable cluster operation under the Extension tab of Cluster properties in config mode of Cluster WebUI and update the setting.
When an application or service has been started using the armload command with /M or /R specified, that process is monitored. To temporarily prevent failover caused by a monitoring error, follow the steps below.
Suspending monitoring for an application/serviceBy using the armloadc command, it is possible to prevent restart or failover caused by a monitoring error for an application/service started by the armload command.Execute the armloadc command on the server on which the application/service is running.armloadc watchID /W pause Restarting monitoring for the application/serviceResume monitoring. Execute the armloadc command on the server on which monitoring for the application/service has been suspended.armloadc watchID /W continue
For details on the armload and armloadc commands, see "Compatible command reference" in "Legacy Feature Guide".
2.10. How to execute chkdsk/defrag¶
2.10.2. How to execute chkdsk/defrag on a mirror/hybrid disk¶
When executing chkdsk or defrag on a partition configured as a mirror disk resource, the procedure differs depending on whether the server is an active server or a standby server.
How to execute chkdsk/defrag on an active server (mirror/hybrid disk)
Refer to "How to execute chkdsk/defrag on a shared disk"
How to execute chkdsk/defrag on a standby server (mirror disk)
If you perform a chkdsk or defragmentation in restoration mode on the standby server, mirror copy overwrites partitions established as mirror disks on the active disk image, and the file system fails to be restored or optimized. This section describes the procedure for chkdsk in order to check media errors.
Suspend the mdw monitor resources temporarily by using the Cluster WebUI or the clpmonctrl command.
(Example)
clpmonctrl -s -m <mdw monitor name>
Isolate the target mirror disk resource.
(Example)
clpmdctrl --break <md resource name>
Enable access to the mirror disk .
(Example)
mdopen <md resource name>
Execute chkdsk or defrag on the target partition from the command prompt.
Important
If the message "chkdsk cannot run because the volume is being used by another process. Would you like to schedule this volume to be checked the next time the system restarts? (Y/N)" appears, select "N".
Disable access to the mirror disk.
(Example)
mdclose <md resource name>
Resume the mdw monitor resources by using the Cluster WebUI or the clpmonctrl command.
(Example)
clpmonctrl -r -m <mdw monitor name>
If automatic mirror recovery is disabled, perform mirror recovery manually from Mirror Disks.
How to execute chkdsk/defrag on a standby server (hybrid disk)
If you perform a chkdsk or defragmentation in restoration mode on the standby server, mirror copy overwrites partitions established as hybrid disks on the active disk image, and the file system fails to be restored or optimized. This section describes the procedure for chkdsk in order to check media errors.
Suspend the hdw monitor resources temporarily by using the Cluster WebUI or the clpmonctrl command.
(Example)
clpmonctrl -s -m <hdw monitor name>
Isolate and enable access to the target hybrid disk resource.
(Example)
clphdsnapshot --open <hd resource name>
Execute chkdsk or defrag on the target partition from the command prompt.
Important
If the message "chkdsk cannot run because the volume is being used by another process. Would you like to schedule this volume to be checked the next time the system restarts? (Y/N)" appears, select "N".
Disable access to the mirror disk ..
(Example)
clphdsnapshot --close
Resume the mdw monitor resources by using the Cluster WebUI or the clpmonctrl command.
(Example)
clpmonctrl -r -m <hdw monitor name>
If automatic mirror recovery is disabled, perform mirror recovery manually from Mirror Disks.
2.11. How to replace a server with a new one¶
When you replace a server in a cluster environment, follow the instructions below:
Set up a new server in the same way as the failed server.
When using a shared disk, do not connect the new server to the shared disk yet.
Set the same computer name and IP address as the failed server.
Register the EXPRESSCLUSTER license and apply updates as they have been registered and applied before.
If there were cluster partition and/or data partition of a mirror disk or hybrid disk on the local disk of the failing server, allocate these partitions and assign drive letters for them as they were configured in the failing server. When you use the disk of the failing server, configure drive letters for the partitions, though allocating partitions is not necessary.
When using a shared disk, set the SCSI controller or the HBA that is connected to the shared disk to be filtered in Filter Settings of Shared Disk upon installing the EXPRESSCLUSTER Server.
After the setup, shut it down and power it off.
Important
In Filter Settings of Shared Disk, set the SCSI controller or the HBA that is connected to the shared disk to be filtered. If the new server is connected to the shared disk when it has not been set to be filtered, data on the shared disk may be corrupted.
If the failed server is still running, shut it down and remove it from the shared disk and the LAN, and make sure other servers in the cluster are working properly. (Ignore errors caused by the failed server being stopped.)
Start the new server while it is connected to the LAN. When using a shared disk, start the server while it is also connected to the shared disk.
When using the shared disk, on the new server, use Disk Management (On the Start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Computer Management, and click Disk Management.) to confirm that the shared disk is visible, and set the same drive letter as the failed server.
At this point, access to the shared disk is controlled, so the disk data cannot be referred.
Connect to a server in normal operation in the cluster by using the Web browser to start the config mode of Cluster WebUI. When using a shared disk, click Properties, HBA tab and Connect on the new server to check or modify the information on HBA and partitions.
Important
On the HBA tab of Properties of the new server, set the SCSI controller or the HBA that is connected to the shared disk to be filtered. If the shared disk is connected when it has not been set to be filtered, data on the shared disk may be corrupted.
When there is any mirror disk resource or hybrid disk resource in the resources used in the new server, stop the failover group containing these resources from the operation mode of Cluster WebUI.
Run "clpcl --suspend --force" from the command prompt on the server in normal operation in the cluster and suspend the cluster.
A server is recognized to have stopped, so the cluster cannot be suspended from the Cluster WebUI.
Select Apply the settings from the File menu in the Builder to apply the cluster configuration data .on the cluster.
When the message "There is difference between the disk information in the configuration information and the disk information in the server. Are you sure you want automatic modification?" appears, select Yes.
If you use a fixed term license, run the following command:
clplcnsc --reregister <a folder path for saved license files>
Resume the cluster from the operation mode of Cluster WebUI. If you stopped any group in step 6, start it.
Note
If you resume the cluster from the Cluster WebUI, the error message "Failed to resume the cluster. Click the Refresh data button, or try again later." is displayed, but ignore it. This is displayed because the new server has not been suspended.
Click Start Server Service for the new server in operation mode of Cluster WebUI.
Restart a manager from operation mode of Cluster WebUI.
When Off is selected for Auto Return in Extension tab of Cluster Properties, click Recover Server of the server where EXPRESSCLUSTER has been reinstalled in the operation mode of Cluster WebUI.
When a mirror disk resource or hybrid disk resource exists in the resources used in the new server and the Auto mirror recovery check box is not selected in Mirror Disk tab of Properties of the cluster, copy the mirror disk or hybrid disk fully from Mirror Disks.
Important
If the server that operates in another mirror disk type cluster is replaced with a new server, differential copy is executed automatically. After differential copy is completed, perform full copy manually. If you do not perform full copy, a mirror disk data incosistency will occcur.
Move group as necessary. When mirror disk or hybrid disk is being fully copied, complete copying before moving.
2.12. Wait time for synchronized cluster startup¶
Even all servers in a cluster are powered on simultaneously, it does not always mean that EXPRESSCLUSTER will start up simultaneously on all servers. EXPRESSCLUSTER may not start up simultaneously after rebooting the cluster following shutdown. Because of this, one server waits for other servers in the cluster to start.
By default, 5 minutes is set to the startup synchronization time. To change the default value, click Cluster Properties in the Cluster WebUI, click the Timeout tab, and select Synchronize Wait Time.
For more information, see "Timeout tab" in "Parameter details" in the "Reference Guide".
2.13. Changing the server configuration (add/delete)¶
2.13.1. Adding a server (mirror disk or hybrid disk is not used)¶
To add a server, follow the steps below:
Important
When adding a server in changing the cluster configuration, do not make any other changes such as adding a group resource.
- The additional server license must be registered.To register licenses, refer to "Installation and Configuration Guide" > "Registering the license".
Make sure that the cluster is working properly.
Start the server to add. For using the shared disk, make sure the server to add is not connected to the shared disk and then start the server to add.
Important
To use the shared disk, do not connect the server to the shared disk before setting it up and powering it off. Data on the shared disk may be corrupted.
Configure the settings that should be done before setting up the EXPRESSCLUSTER Server on the server to add. However, to use the shared disk, do not configure the settings for the disk in this step.
See also
As for the settings to be configured before the setup, see "Settings after configuring hardware" of "Determining a system configuration" in "Installation and Configuration Guide".
Set up the EXPRESSCLUSTER Server to the server to add. Enter the port numbers of the Cluster WebUI and the disk agent. Configure the same settings for the port number as the server that has been already set up. To use the shared disk, set the HBA that is connected to the shared disk to be filtered. Register the license as necessary. After the setup, shut down the server to add and power it off.
Important
If the shared disk is not set to be filtered in Filter Settings of Shared Disk when setting up the EXPRESSCLUSTER Server, do not connect to the shared disk even after the setup completes. Data on the shared disk may be corrupted. Reinstall EXPRESSCLUSTER and set the shared disk to be filtered.
Start the server to add. To use the shared disk, connect the disk to the server to add at first, and then start the server.
To use the shared disk, configure the settings for the disk on the server to add.
Use Disk Management (On the start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Computer Management, and click Disk Management.) to confirm that the shared disk is visible.
Set the switchable partitions for disk resources and the partitions used as the cluster partition or data partition for hybrid disk resources so that they can be accessed from all the servers by using the same drive letters.
On all the servers, set the same drive letter to the disk heartbeat partitions to be used for the disk network partition resolution resources.
At this point, access to the shared disk is controlled, so the disk data cannot be referred.
Note
Changing or deleting the drive letter assigned to a partition of a shared disk may fail. To avoid this, specify the drive letter according to the procedure below:
Run the following command by using the command prompt to delete the drive letter.
mountvol <drive_letter(_to_be_changed)>: /P
Confirm that the drive letter has been deleted from the target drive by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
Assign a new drive letter to the drive by using Disk Management.
Access to other server in the cluster via the Web browser and click the Add server in the config mode of Cluster WebUI.
By using the config mode of Cluster WebUI, configure the following settings of the server to add.
Information on the HBA and the partition on the HBA tab of Properties of the server to add (when using the shared disk).
Information on the disk heartbeat partition on the NP Resolution tab of Cluster Properties (when using the shared disk).
Information on the Source IP Address of the server to add on the Details tab of Properties of the virtual IP resource (when using the virtual IP resource).
IP Address of the server to add on the Monitor(special) tab of Properties of the NIC Link Up/Down monitor resource (when using the NIC Link Up/Down monitor resource).
Information on the ENI ID of the server to add on the Details tab of Properties of the AWS elastic IP resources (when using an AWS Elastic IP resource).
Information on the ENI ID of the server to add on the Details tab of Properties of the AWS virtual IP resources (when using an AWS virtual IP resource).
Information on the IP Address of the server to add on the Details tab of Properties of the Azure DNS resources (when using an Azure DNS resource).
Important
On the HBA tab of Properties of the server to add, set the SCSI controller and the HBA connected to the shared disk to be filtered. If the shared disk is connected when it has not been set to be filtered, data on the shared disk may be corrupted.
Click Properties of the failover group in the Cluster WebUI config mode. Add the server in the Startup Server tab as a startable server (Note : On each failover group, only required server for failover must be added.).
Click Apply the Configuration File in the Cluster WebUI config mode to update the cluster configuration.
Note: Apply the configuration when the confirmation message is displayed.
Perform Start server service of the added server in the Cluster WebUI operation mode.
Click Refresh data in the Cluster WebUI operation mode and confirm the displayed cluster information is in normal status.
If the server recovery is required, recover the server manually in the Cluster WebUI operation mode.
2.13.2. Adding a server (Mirror disk or hybrid disk is used)¶
To add a server, follow the steps below:
Important
When adding a server in changing the cluster configuration, do not make any other changes such as adding a group resource.
- The additional server license must be registered.To register licenses, refer to "Installation and Configuration Guide" > "Registering the license".
Make sure that the cluster is working properly.
Start the server to add. For using the shared disk, make sure the server to add is not connected to the shared disk and then start the server to add.
Important
To use the shared disk, do not connect the server to the shared disk before setting it up and powering it off. Data on the shared disk may be corrupted.
Configure the settings that should be done before setting up the EXPRESSCLUSTER Server on the server to add. However, to use the shared disk, do not configure the settings for the disk in this step.
See also
As for the settings to be configured before the setup, see "Settings after configuring hardware" of "Determining a system configuration" in "Installation and Configuration Guide".
Set up the EXPRESSCLUSTER Server to the server to add. Enter the port numbers of the Cluster WebUI and the disk agent. Configure the same settings for the port number as the server that has been already set up. To use the shared disk, set the HBA that is connected to the shared disk to be filtered. Register the license as necessary. After the setup, shut down the server to add and power it off.
Important
If the shared disk is not set to be filtered in Filter Settings of Shared Disk when setting up the EXPRESSCLUSTER Server, do not connect to the shared disk even after the setup completes. Data on the shared disk may be corrupted. Reinstall EXPRESSCLUSTER and set the shared disk to be filtered.
Start the server to add. To use the shared disk, connect the disk to the server to add at first, and then start the server.
To use the shared disk, configure the settings for the disk on the server to add.
Use Disk Management (On the start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Computer Management, and click Disk Management.) to confirm that the shared disk is visible.
Set the switchable partitions for disk resources and the partitions used as the cluster partition or data partition for hybrid disk resources so that they can be accessed from all the servers by using the same drive letters.
On all the servers, set the same drive letter to the disk heartbeat partitions to be used for the disk network partition resolution resources.
At this point, access to the shared disk is controlled, so the disk data cannot be referred.
Note
Changing or deleting the drive letter assigned to a partition of a shared disk may fail. To avoid this, specify the drive letter according to the procedure below:
Run the following command by using the command prompt to delete the drive letter.
mountvol <drive_letter(_to_be_changed)>: /P
Confirm that the drive letter has been deleted from the target drive by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
Assign a new drive letter to the drive by using Disk Management.
Access to other server in the cluster via the Web browser and click the Add server in the config mode of Cluster WebUI.
By using the config mode of Cluster WebUI, configure the following settings of the server to add.
Information on the HBA and the partition on the HBA tab of Properties of the server to add (when using the shared disk).
Information on the disk heartbeat partition on the NP Resolution tab of Cluster Properties (when using the shared disk).
Information on the Source IP Address of the server to add on the Details tab of Properties of the virtual IP resource (when using the virtual IP resource).
IP Address of the server to add on the Monitor(special) tab of Properties of the NIC Link Up/Down monitor resource (when using the NIC Link Up/Down monitor resource).
Information on the ENI ID of the server to add on the Details tab of Properties of the AWS elastic IP resources (when using an AWS Elastic IP resource).
Information on the ENI ID of the server to add on the Details tab of Properties of the AWS virtual IP resources (when using an AWS virtual IP resource).
Information on the IP Address of the server to add on the Details tab of Properties of the Azure DNS resources (when using an Azure DNS resource).
Important
On the HBA tab of Properties of the server to add, set the SCSI controller and the HBA connected to the shared disk to be filtered. If the shared disk is connected when it has not been set to be filtered, data on the shared disk may be corrupted.
When using a hybrid disk resource in the added server, click Properties of Servers in the Conf mode of Cluster WebUI. From the Server Group tab, add the server to the servers that can run the Group. Do this for required servers only.
Click Properties of the failover group in the config mode of Cluster WebUI. Add the server to the servers that can be started on the Startup Server tab. Add the server that can be started only to the required failover group.
Click Apply the Configuration File in the Cluster WebUI config mode to update the cluster configuration. OS reboot might be required (proceed accordingly).
If the server recovery is required, recover the server manually in the Cluster WebUI operation mode.
2.13.3. Deleting a server (Mirror disk or hybrid disk is not used)¶
To delete a server, follow the steps below:
Important
When deleting a server in changing the cluster configuration, do not make any other changes such as adding a group resource.
Refer to the following information for licenses registered in the server you want to delete.
No action required for CPU licenses.
- VM node licenses and other node licenses are discarded when EXPRESSCLUSTER is uninstalled.Back up the serial numbers and keys of licenses if required.
No action required for fixed term licenses. Unused licenses are automatically collected and provided to other servers.
Make sure that the cluster is working normally. If any group is active on the server you are going to delete, move the group to another server.
When the server to be deleted is registered in a server group, click Properties of Server of the config mode of Cluster WebUI. Delete the server from Servers that can run the Group in the Server Group tab.
Click Remove Server of the server to delete in the config mode of Cluster WebUI.
Click Apply the Configuration File in the Cluster WebUI config mode to update the cluster configuration.
Proceed accordingly when the confirmation message is displayed.
Click Refresh data in the operation mode of Cluster WebUI to verify the cluster is properly working.
Deleted servers will not belong to clusters. To uninstall EXPRESSCLUSTER servers on the servers you want to delete, refer to "Installation and Configuration Guide" > "Uninstalling and reinstalling EXPRESSCLUSTER" > "Uninstallation".
2.13.4. Deleting a server (Mirror disk or hybrid disk is used)¶
To delete a server, follow the steps below:
Important
When deleting a server in changing the cluster configuration, do not make any other changes such as adding a group resource.
Refer to the following information for licenses registered in the server you want to delete.
No action required for CPU licenses.
VM node licenses and other node licenses are discarded when EXPRESSCLUSTER is uninstalled.Back up the serial numbers and keys of licenses if required.No action required for fixed term licenses. Unused licenses are automatically collected and provided to other servers.
Stop groups using mirror disk resources or hybrid disk resources with Cluster Web UI operation mode.
Make sure that the cluster is working properly. (However, ignore errors in the server to be deleted.)
Access to other server in the cluster via the Web browser and start the Cluster WebUI.
When the server to be deleted is registered in a server group, click Properties of Server of the config mode of Cluster WebUI. Delete the server from Servers that can run the Group in the Server Group tab.
Click Remove Server of the server to delete in the config mode of Cluster WebUI.
Click Remove resource of mirror disk resource or hybrid disk resource in the Cluster WebUI config mode.
Click Apply the Configuration File in the Cluster WebUI config mode to update the cluster configuration. OS reboot might be required (proceed accordingly).
Click Refresh data in the operation mode of Cluster WebUI to verify the cluster is properly working.
Deleted servers will not belong to clusters. To uninstall EXPRESSCLUSTER servers on the servers you want to delete, refer to "Installation and Configuration Guide" > "Uninstalling and reinstalling EXPRESSCLUSTER" > "Uninstallation".
2.14. Changing the server IP address¶
To change the server IP address after you have started the cluster system operation, follow the instructions below.
2.14.1. When changing the mirror connect IP address is not required¶
Make sure that the cluster is working properly.
Suspend the cluster by using the operation mode of Cluster WebUI.
Change the OS network configuration in the Properties of My Network Places.
Change the IP address on the Interconnect tab of the Cluster Properties by using the config mode of Cluster WebUI.
If the changed IP address is used for the NIC Link Up/Down monitor resource, change the IP address on the Monitor(special) tab of the monitor resource properties.
Click Apply the Configuration File from the config mode of Cluster WebUI and apply the cluster configuration data to the cluster.
Resume the cluster by using the operation mode of Cluster WebUI.
2.14.2. When changing the mirror connect IP address is required¶
Make sure that the cluster is working properly.
Stop the cluster by using the operation mode of Cluster WebUI.
Change the OS network configuration in the Properties of My Network Places.
Change the IP address on the Interconnect tab and the MDC tab of the Cluster Properties by using the config mode of Cluster WebUI.
If the changed IP address is used for the NIC Link Up/Down monitor resource, change the IP address on the Monitor(special)tab of the monitor resource properties.
Click Apply the Configuration File from the config mode of Cluster WebUI and apply the cluster configuration data to the cluster.
Reboot the OS on all the servers.
2.15. Changing the host name¶
Follow the steps below if you want to change the host name of a server after you have started the cluster system operation.
2.15.1. Environment where the mirror disk / hybrid disk does not exist¶
Make sure that the cluster is working properly.
If the group is started on the server whose host name is to be changed, move the group.
Suspend the cluster by using the operation mode of Cluster WebUI.
Change the host name in the properties of My Computer.
Note
Do not restart the OS at this stage. The cluster configuration data will not be able to be applied until the OS is completely restarted.
Click Rename Server of the server in the config mode of Cluster WebUI.
Use the config mode of Cluster WebUI to save the cluster configuration information in which the server name has been changed in a disk area accessible from a cluster server.
When the Cluster WebUI is used on a cluster server, save the information in the local disk. When the Cluster WebUI is used in another PC, save the information in the shared disk that can be accessed from the cluster server or save it in an external media or the like and then copy it to the local disk of a cluster server.
Run the following command on one of the cluster servers to upload the saved cluster configuration information.
clpcfctrl --push -x <path_of_the_cluster_configuration_information> --nocheck
Note
Check cluster configuration information before the distribution if required.
Shutdown the OS on the server you have changed the host name.
Resume the cluster from the the operation mode of Cluster WebUI.
Note
If the cluster is resumed from the WebManager, the error message "Failed to resume the cluster. Click the Reload button, or try again later." is displayed, but ignore it. This message is displayed because the server whose host name was changed is not suspended.
Restart the manager from the operation mode of Cluster WebUI.
Execute the server of which the host name has been changed. When Off is selected for Auto Return in Extension tab of Cluster Properties, recover the cluster by using the operation mode of Cluster WebUI manually.
2.15.2. Environment where the mirror disk / hybrid disk exists¶
Make sure that the cluster is working properly.
Stop the cluster by using the operation mode of Cluster WebUI..
Change the host name in the properties of My Computer.
Note
Do not restart the OS at this stage. The cluster configuration data will not be able to be applied until the OS is completely restarted.
Click Rename Server of the server in the config mode of Cluster WebUI.
Use Cluster WebUI to save the cluster configuration information in which the server name has been changed in a disk area accessible from a cluster server.
When the Cluster WebUI is used on a cluster server, save the information in the local disk. When the Cluster WebUI is used in another PC, save the information in the shared disk that can be accessed from the cluster server or save it in an external media or the like and then copy it to the local disk of a cluster server.
Open Administrative Tools - Services for all servers to stop the EXPRESSCLUSTER X Disk Agent service.
Run the following command on one of the cluster servers to upload the saved cluster configuration information.
clpcfctrl --push -x <path_of_the_cluster_configuration_information> --nocheck
Note
Check cluster configuration information before the distribution if required.
Reboot the OS on all the servers.
2.16. Replacing the network card¶
To replace the network card, follow the steps below. To replace the network card used for the mirror connect, follow the same steps as well.
Make sure that the cluster is working properly. (However, ignore errors in the network card to be replaced.)
If a group is running on the server whose network card is to be replaced, move the group. If the network card has been used for the mirror connect, no groups can be moved until the mirror disk recovers after the replacement. Because of this, stop the group by Cluster WebUI.
Change a startup type to manual start on the server where you will replace a network card.
clpsvcctrl.bat --disable -a
Click Server Shut Down of the server whose network card is to be replaced from Cluster WebUI.
After the shutdown completes, replace the network card.
Start the server that the network card is replaced.
Configure the settings for the OS network in the Properties of My Network Places. Configure the same settings for the network as before replacing the network card.
Open Services on the server whose network card has been replaced (On the Start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Services.), and restore Startup Type in Properties of each service which has been changed to Manual in step 3 to Automatic, and restart the server.
When Off is selected for Auto Return in Extension tab of Cluster Properties, recover the cluster by using the Cluster WebUI manually.
Move group as necessary.
2.18. Changing the disk configuration - For a mirror disk -¶
2.18.1. Replacing the disk¶
To replace the mirror disk, see "2.23. Replacing the mirror disk".
2.18.2. Adding a disk¶
To add a disk used for the mirror disk, follow the steps below:
Make sure that the cluster is working properly.
If the group is running on the server to which a disk is added, move the group.
Shut down only one server by using the operation mode of Cluster WebUI and power it off.
Expand the disk, and start the server.
Return the server to the cluster, and rebuild the mirror again.
Configure the settings for the disk by the server on which the disk is added.
Reserve a data partition and a cluster partition for mirror disk using Disk Management (On the Start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Computer Management, and click Disk Management.). Set their drive letters so that they will be the same on both of the servers.
Perform the steps 2 to 6 on other server.
Suspend the cluster by using the operation mode of Cluster WebUI.
Add the mirror disk resource by clicking Add Resource of the group to which the mirror disk resource is added in the config mode of Cluster WebUI.
Click Apply the Configuration File of the config mode of Cluster WebUI and apply the cluster configuration data to the cluster.
Resume the cluster by using the operation mode of Cluster WebUI.
Start the added mirror disk resource or the group that added the mirror disk resource. If Auto Mirror Initial Construction is set to be performed in Cluster Properties, the initial mirror construction is started. If Auto Mirror Initial Construction is set not to be performed, perform the initial mirror construction manually.
Move group as necessary.
2.18.3. Deleting a disk¶
Follow the steps below to delete the disk used for the mirror disk.
Make sure that the cluster is working properly.
Stop the group with the mirror disk resource to be deleted by using the operation mode of the Cluster WebUI.
Suspend the cluster by using the operation mode of the Cluster WebUI.
Click the group from which the mirror disk resource is deleted in the config mode of Cluster WebUI. Click Remove Resources of the mirror disk resource.
Click Apply the Configuration File of the config mode of Cluster WebUI and apply the cluster configuration data to the cluster.
Resume the cluster by using the operation mode of Cluster WebUI.
Start the group with the operation mode of Cluster WebUI.
Shut down the server on which the group has not been started with the operation mode of Cluster WebUI and power it off.
Remove the disk, and start the server.
Move the group, and perform the steps 8 and 9 on other server.
Move group as necessary.
2.19. Backing up/restoring data¶
Data is backed up and restored, as shown in the following image. For details on how to back up data, see the manuals of the backup software.
2.20. Performing a snapshot backup¶
When a mirror disk or a hybrid disk is used, it is possible to suspend mirroring to back up the stand-by data partition as a snapshot image. This is referred to as snapshot backup.
During executing snapshot backup, failover cannot be performed to the stand-by server or server group of the copying destination because mirroring is temporarily canceled. While in this state, cancel the access restriction to the data partition of the standby server to collect the backup.
To return from the snapshot status, control the disk access and build the mirror again.
For details for how to collect the backup, see the manuals of the backup software.
Note
When mirroring is interrupted, note that the data at the mirroring copy destination does not necessarily have integrity as NTFS or application data, depending on the timing of the mirroring.
2.20.1. Performing a snapshot backup¶
To execute the snapshot backup for a mirror disk, follow the steps below:
Stop the mirror disk monitor resource that monitors the mirror disk that will be backed up on the server to be backed up.
clpmonctrl -s -m <mdw(mirror_disk_monitor_resource_name)>
Disconnect the mirror disk.
clpmdctrl --break <md(mirror_disk_resource_name)>
Allow accesses to the mirror disk.
mdopen <md(mirror_disk_resource_name)>
Back up necessary files.
Forbid accesses to the mirror disk.
mdclose <md(mirror_disk_resource_name)>
Start the mirror disk monitor resource that monitors the mirror disk.
clpmonctrl -r -m <mdw(mirror_disk_resource_monitor_resource_name)>
If automatic mirror recovery is disabled, perform mirror recovery manually from Mirror Disks.
To execute the snapshot backup for a hybrid disk, collect the backup by following the steps below in a server in the standby server group of the copying destination.
Run the following command in the server where the backup is collected.
clphdsnapshot -open <hybrid_disk_resource_name>
When the access restriction in the data partition is canceled, back up the required files.
Run the following command in the server where the backup has been collected to restart mirroring.
clphdsnapshot --close <hybrid_disk_resource_name>
When the automatic mirror recovery is disabled, perform mirror recovery manually from Mirror Disks.
For the commands, see "EXPRESSCLUSTER command reference" in the "Reference Guide".
2.21. Restoring the system disk¶
2.21.1. Restoring the system disk¶
If an error occurs in the system disk of the server, change the disk following the steps below, and restore the backup data. If EXPRESSCLUSTER has been updated or changes have been made on the configuration after the backup was created, make sure to uninstall EXPRESSCLUSTER after restoration and set this server as a new server by following the steps for server replacement.
If any group is running on the server where a system disk is restored (hereafter referred to as target server), move the group. When a mirror disk resource or hybrid disk resource is used, make sure that these resources are running properly after the group is moved.
Important
If the mirror disk resource or hybrid disk resource is not in the latest status, and if the system disk is restored on the server that is not to be restored, the data on the data partition may be corrupted.
If the mirror disk resource or hybrid disk resource is used, execute the following procedure.
Uncheck Auto Mirror Recovery in Mirror Disk tab of Cluster Properties in the config mode of Cluster WebUI.
Click Apply the Configuration File of the config mode of Cluster WebUI, and apply the cluster configuration data to the cluster.
If the target server is running, shut down the server by selecting Shut Down from Start menu.
When the shared disk is connected to the target server, remove the cable connecting the target server and the shared disk. Remove the cable carefully by following the instructions shown below:
When a SCSI disk array is used, remove the cable from the base of the two-way cable.
When a Fibre Channel disk array device, remove the cable between the failing server and the Fibre Channel-HUB or the Fibre Channel-Switch.
Change the system disk of the server to be restored. For details on how to change the system disk, see the user's guide provided with the device.
Follow the normal installation procedure and install the OS.
To install the OS, see the user's guide provided with the server.
Make sure to configure the network settings when installing the OS. Apply the same OS service pack as the removed disk.
Make sure that the OS is running normally, and install the backup software. (For details, see the manual of the backup software.)
Use the backup software to restore the system disk from the backup.
There is no note cluster dependent note. Restore the system disk with the settings that allow the registry to be recovered and files with the same file names to be overwritten. For details, see the manual of the backup software.
When the EXPRESSCLUSTER Server service of the target server is configured as Auto Startup, change the settings to Manual Startup.
Reset its drive letter if it has been changed. Make sure that the date and time are the same as those of other servers in the same cluster.
When the driver of SCSI controller or FC-HBA (Host Bus Adapter) cannot be restored, re-install the above driver. For details, refer to the instruction manual of backup software.
Restart the target server. When the shared disk is not connected to the target server, the following steps up to 16 are not required.
Connect to the server that has not been restored via the Web browser to start the Cluster WebUI. Open the Properties of the target server to configure the filter settings of the HBA connected to the shared disk.
Click Connect on the HBA tab to acquire the disk configuration information for the target server, and then select the check box for the HBA connected to the shared disk.
Do not change any settings other than above.
Use Cluster WebUI to save the cluster configuration information in which HBA filter settings have been configured in a disk area accessible from a cluster server.
When the Cluster WebUI is used on a cluster server, save the information in the local disk. When the Cluster WebUI is used in another PC, save the information in the shared disk that can be accessed from the cluster server or save it in an external media disk or the like and then copy it to the local disk of a cluster server.
Run the following command on one of the cluster servers to upload the saved cluster configuration information.
clpcfctrl --push -x <path_of_the_cluster_configuration_information> --nocheck
Shut down the target server and connect the disk cable, and then reboot the server.
If the server configuration (before restoration) meets any of the following conditions, create a partition again with Disk Management.
A cluster partition of a mirror disk resource/hybrid disk resource was present in the system disk
A data partition of a mirror disk resource/hybrid disk resource was present in the system disk.
Note
To re-create a data partition, resize the data partition according to data partition size of another server where you did not perform restoration.
Start the target server and check the drive letter of the shared disk and the mirror disk (data partition and cluster partition) in Disk Management of the target server. If the drive letter has been changed, re-configure it as it was, restart the server and check that the drive letter is configured correctly.
Connect to the server which has not been restored via the Web browser to start the Cluster WebUI. When the shared disk is connected to the target server and the shared disk has a volume that is not for filtering, update the information on the partition that is not for filtering in the HBA tab of Properties in the target server.
Perform the procedures in steps 14 and 15 above to save the cluster configuration information and then upload the information by using the clpcfctrl command from the server.
If the message "There is difference between the disk information in the configuration information and the disk information in the server. Are you sure you want automatic modification?" appears upon saving the configuration information, select Yes.
Restore the setting of the EXPRESSCLUSTER Server service to Auto Startup and reboot the target server.
When the Auto Recovery is configured as Off in Extension tab of Cluster Properties of the cluster, click Recover Server of the target server in the config mode of Cluster WebUI. If mirror disk resource or hybrid disk resource is not used on the target server, the following procedure is not required.
When a mirror disk resource or hybrid disk resource is created on the system disk, the resource must be recreated before mirror recovery. Perform the following procedure.
23-1. From the the operation mode of Cluster WebUI, stop the group containing the target mirror disk resource or hybrid disk resource.
23-2. Suspend the cluster.
23-3. From the config mode of Cluster WebUI, execute Remove Resource of the target mirror disk resource or hybrid disk resource. Before deleting the resource, make a note of the parameter values required for recreating the resource.
23-4. Click Apply the Configuration File from the File menu and then apply the cluster configuration data to the cluster.
23-5. Execute Add Resource of the failover group. For each parameter, specify the same value as that specified for the resource that was deleted.
23-6. Click Apply the Configuration File from the File menu again and then apply the cluster configuration data to the cluster.
23-7. Resume the cluster.
Execute the mirror recovery (full copy) from the Mirror Disks of Cluster WebUI to all the mirror disk resources and hybrid disk resources.
Note
Data on the server on which restore is performed (the disk is replaced) may not be up to date. A server where restoring is not performed must be the source of the copy.In addition, Recover the mirror by fully copying, not partially copying because the data difference may be invalid in the process of restoring.When you have unchecked Auto Mirror Recovery in step 2, select Mirror disk tab and check Auto Mirror Recovery in the Cluster Properties of the config mode of Cluster WebUI.
Click Apply the Configuration File of the config mode of Cluster WebUI, to upload the cluster configuration data to the cluster.
Move the group as required.
2.23. Replacing the mirror disk¶
If an error occurs in a disk that forms a mirror set, follow the steps below to replace the disk. When using a disk array, the procedure below also needs to be performed if the array configuration is changed or a disk is recognized as a new one due to DAC replacement or some other reason.
You can replace a local disk mirrored by a hybrid disk resource by following the steps below. In that case, consider "mirror disk resource" in the description below as "hybrid disk resource". To replace a shared disk that is mirrored by a hybrid disk resource consisting of three or more servers, see the procedure described in "Replacing the hybrid disk".
Make sure that the cluster is working properly. (However, ignore errors in the disk to be replaced.)
If the group is running on the server, which contains the disk to be replaced, move the group.
When the Auto Mirror Recovery check box is selected in Properties of the cluster, in the config mode of Cluster WebUI, select Properties of the cluster and the Mirror disk tab, clear the Auto Mirror Recovery check box, click Apply the Configuration File in the File menu to apply the cluster configuration data to the cluster.
Shutdown the server whose disk is to be replaced from the operation mode of Cluster WebUI and power it off.
Replace the disk and start the server.
Configure the settings for the disk by the server with the replaced disk.
Reserve the data partition and the cluster partition for the mirror disk by using Disk Management (On the Start menu, point to Settings, and click Control Panel. Double-click Administrative Tools and then Computer Management, and click Disk Management.) Set the drive letters of the data partition and the cluster partition so that the drive letters of data partition and cluster partition and the size of data partitions are the same in both servers.
When the Auto Mirror Recovery is configured as Off in the Extension tab in Cluster properties, return the replaced server to the cluster from the operation mode of Cluster WebUI.
Suspend the cluster.
Start Cluster WebUI. If you have unchecked the Auto Mirror Recovery check box in procedure 3, check the Auto Mirror Recovery check box again.
Click Apply the Configuration File of the config mode of Cluster WebUI to upload the cluster configuration data to the cluster.
When the message "There is difference between the disk information in the configuration information and the disk information in the server. Are you sure you want automatic modification?" appears, select Yes.
Resume the cluster from the operation mode of Cluster WebUI.
If the Auto Mirror Recovery check box is selected in cluster properties, full reconstruction of mirror will be performed. If not, it is required to perform the reconstruction manually.
Move the group as necessary.
2.24. Replacing the hybrid disk¶
In a hybrid disk resource environment consisting of three or more servers, if an error occurs in a shared disk that forms a mirror set, replace that disk by applying the procedure described below. When a disk array is used, the procedure below also needs to be performed if the configuration of the array is changed or a disk is recognized as being a new one due to DAC replacement or some other reason.
To replace a local disk that is mirrored by a hybrid disk resource, see the procedure in "Replacing the mirror disk".
Check that the cluster is working properly. (Ignore errors with the disk that is to be replaced.)
If the group is running on the server, which contains the disk to be replaced, move the group.
When the Auto Mirror Recovery check box is selected in Properties of the cluster, in the config mode of Cluster WebUI, select Properties of the cluster and the Mirror disk tab, clear the Auto Mirror Recovery check box, click Apply the Configuration File in the File menu to apply the cluster configuration data to the cluster.
Select Stop Server Service from the operation mode of Cluster WebUI to execute cluster stop for all the servers connected to the shared disk to be replaced.
On all the servers connected to the shared disk to be replaced, set Startup Type to Manual for EXPRESSCLUSTER Server service.
Shut down all the servers connected to the shared disk to be replaced, and power them off.
Power off and replace the shared disk.
Power on the shared disk, and configure its settings.
If the RAID is to be built again or if the LUN configuration is to be changed, use the setup tool provided with the shared disk. For details, refer to the manual provided with the shared disk.
Start only one server, create a partition by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management) and set the drive letter as before replacing the disk. Even if the drive letter you want to assign is the same as the drive letter automatically assigned by the OS, manually assign the desired drive letter explicitly; for example, by deleting the OS assigned drive letter and then assigning the desired drive letter.
Note
Controlling the access to the created partition is started upon its creation, so it cannot be formatted. Set only the drive letter here.
Note
The size of the switchable partition used for a disk resource can be changed in this occasion. The sizes of the data partitions of a hybrid disk resource need to be the same in both server groups. For this reason, to change the size, it is necessary to delete the resource, change the partition size in both server groups and then create the resource again.
Note
Changing or deleting the drive letter assigned to a partition of a shared disk may fail. To avoid this, specify the drive letter according to the procedure below:
Run the following command by using the command prompt to delete the drive letter.
mountvol <drive_letter(_to_be_changed)>: /P
Confirm that the drive letter has been deleted from the target drive by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
Assign a new drive letter to the drive by using Disk Management.
To format the partition to be used as a disk resource, execute the following command to temporarily release the access restriction:
clpvolctrl --open <drive_letter_of_partition_to_be_used_as_disk_resource>
From Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management), format the partition to be used as a disk resource.
To restore the access restriction temporarily released in step 10 above, execute the following command:
clpvolctrl --close <drive_letter_of_partition_to_be_used_as_disk_resource>
Start the other servers connected to the replaced shared disk, and check that the partition created on the first server is visible from Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
Set the drive letter for each partition on the shared disk in the same way as for the first server as before replacing the disk.
On all the servers connected to the replaced shared disk, restore Startup Type to Automatic for EXPRESSCLUSTER Server service.
Start the Cluster WebUI and select Service, then click Start server service to execute cluster start for all the servers connected to the replaced shared disk.
Note
An hdw or hdtw warning message may be displayed at this time. Ignore the message and proceed to the next step.
Suspend the cluster.
When there are partitions with no access restrictions on the replaced shared disk, add these partitions to Partition excluded from cluster management by selecting the HBA tab of Properties and then clicking Connect for each server connected to the shared disk.
Note
When Partition excluded from cluster management is set for the shared disk before replacement, delete the setting, and then make the setting again. Perform the following procedure.
Open the HBA tab of Properties of the server that is connected to the replaced disk from the config mode of Cluster WebUI and then click Connect.
Select an HBA for which filtering is checked and then execute Remove for all the partitions that are displayed in Partition excluded from cluster management.
Click Add again to add all the partitions deleted in step 18-2.
Make sure that Volume, Disk No., Partition No., Size, and GUID are displayed for each partition that is excluded from cluster management.
Start the Cluster WebUI. If you have unchecked Auto Mirror Recovery in step 3, check it again.
Click Apply the Configuration File of the config mode of Cluster WebUI to apply the cluster configuration data to the cluster.
When the pop-up message "There is difference between the disk information in the configuration information and the disk information in the server. Are you sure you want automatic modification?" appears, select Yes.
Resume the cluster from the operation mode of Cluster WebUI.
If Auto Mirror Recovery is checked, full reconstruction (full copying) of the mirror set is performed automatically. Otherwise, manually reconstruct the mirror set.
Move the group as required.
2.26. Increasing the mirror disk size¶
Note
Make sure that the cluster is working properly.
Make sure mirror disk resource you will extend is in normal status.
Configure mirror disk not to be recovered automatically. Refer to either of the followings:
Change mirror agent settings to disable automatic mirror disk recovery.
Stop mirror disk monitor resources temporarily.
For mirror disk settings, Refer to " Reference Guide" > "Parameter details" > "Cluster properties" > "Mirror Disk tab".
- Run the clpmdctrl command on the server an inactive mirror disk resource belongs to.The following example is to extend to 500 gibibytes for a md01 data partition.
clpmdctrl --resize md01 500G
Run the clpmdctrl command on the other server. The following example is to extend to 500 gibibytes for a md01 data partition.
clpmdctrl --resize md01 500G
Run the following command to confirm the volume sizes of the both servers are the same.
clpvolsz <Partition drive letter for mirror disk resource >:
Run the diskpart command on the server an active mirror resource belongs to.
diskpart
Run the list volume command at DISKPART prompt to confirm the volume number (### column) of the target data partition. The example is as follows:
DISKPART> list volume Volume ### Ltr Label Fs Type Size Status Info ---------- --- ----------- ----- ---------- ------- --------- -------- Volume 0 E DVD-ROM 0 B No Media Volume 1 C NTFS Partition 99 GB Healthy Boot Volume 2 D NTFS Partition 500 GB Healthy Volume 3 FAT32 Partition 100 MB Healthy System
Run the select volume command at DISKPART prompt to choose the target volume.
DISKPART> select volume 2
Run the extend filesystem command at DISKPART prompt to extend the target file system of the volume.
DISKPART> extend filesystem
Run the exit command at DISKPART prompt to end diskpart prompt.
DISKPART> exit
Important
clpmdctrl --resize md01 500G -force
2.27. Increasing the hybrid disk size¶
First, expand the logical disk volume in the shared disk settings so that you can allocate enough free space for increasing the disk size immediately after the partition that is used by the hybrid disk resource.
The disk size can not be increased by using the free space of a disk other than the disk that includes the partition that you want to expand.
Make sure that the cluster is working properly.
Stop the group that includes the hybrid disk resource to be expanded.
From the config mode of Cluster WebUI, delete the mirror disk resource to be expanded from the group. (If you want to expand multiple mirror disk resources, remove all of them from the group.)
If Startup Attribute is Auto Startup on the Attribute tab of Group Properties, change it to Manual Startup.
Click Apply the Configuration File to apply the cluster configuration data to the cluster. If a message asking you to resume the cluster is displayed, select Cancel.
Select one server from each server group to be used as a work server.
Shut down all servers, excluding the work servers.
- For work servers that use a shared disk as a hybrid disk resource, run the following command to allow access to the partition to be expanded. This step is not necessary for a work server that does not use a shared disk.
clpvolctrl --open <drive_letter_of_the_partition>
Expand the volume on both work servers by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
After expanding the volume size, run the following command to confirm that the volume sizes of all work servers are the same.
clpvolsz <drive_letter_of_the_partition>:
On both work servers, run the following command to prohibit access to the expanded partition. This step is not necessary for a work server for which step 8 has not been performed.
clpvolctrl --close <drive_letter_of_the_partition>
Start all servers, excluding the work servers.
Check whether the drive letter of the expanded volume has been changed on all servers. If the drive letter has been changed, assign the previous drive letter.
Note
Changing or deleting the drive letter assigned to a partition of a shared disk may fail. To avoid this, specify the drive letter according to the procedure below:
Run the following command by using the command prompt to delete the drive letter.
mountvol <drive_letter(_to_be_changed)>: /P
Confirm that the drive letter has been deleted from the target drive by using Disk Management (Control Panel > Administrative Tools > Computer Management > Disk Management).
Assign a new drive letter to the drive by using Disk Management.
From the config mode of Cluster WebUI, recreate the mirror disk resources that were deleted in step 3.
Click Apply the Configuration File to apply the cluster configuration data to the cluster.
Resume the cluster by using the operation mode of Cluster WebUI. For servers other than the work servers, start the service individually.
Click the expanded mirror disk resource from the mirror disks and perform a full copy assuming that the server with the latest data is the copy source.
Start the group and confirm that each hybrid disk resource starts normally.
If Startup Attribute was changed from Auto Startup to Manual Startup in step 4, change this to Auto Startup by using the config mode of Cluster WebUI. Then, click Apply the Configuration File from the File menu to apply the cluster configuration data to the cluster.
Move the group as required.
2.28. Replacing the disk array controller (DAC)/updating the firmware¶
After the disk array controller (DAC) is replaced or the firmware is updated, the OS may recognize an existing disk as a new disk even if the disk has not been replaced actually. The required procedure varies depending on how the OS recognizes the disk. Therefore, be sure to perform the following procedure when replacing the DAC or updating the firmware.
Make sure that the cluster is working properly.
If a group is active on a server on which DAC is to be replaced or on which the firmware is to be updated (hereafter referred to as target server), move the group.
- Before replacing the DAC or updating the firmware, execute the following command to check the combinations of "drive letter" and "GUID" for the partitions of all the mirror disk resources and hybrid disk resources.
mountvol
Output example:
When replacing the DAC, shut down the target server from the operation mode of Cluster WebUI to power off.
- Replace the DAC or update the firmware.When replacing the DAC, power on the target server to start the OS.
- After the completion of DAC replacement or firmware update, perform the following procedure to check that the OS recognizes the disk used by the mirror disk resources and hybrid disk resources as an existing disk.Execute the following command on the target server to check whether the combinations of "drive letter" and "GUID" for the mirror disk resources and hybrid disk resources have changed from those checked in step 3.
mountvol
- When the combinations of "drive letter" and "GUID" for all mirror disk resources and hybrid disk resources have not changed from those checked in step 3, the disk is recognized as an existing disk by the OS. In this case, execute step 10 and subsequent steps. (Steps 8 and 9 are not required.)When the combination has changed from that checked in step 3, the disk is recognized as a new disk by the OS. In this case, execute step 8 and subsequent steps.
Check the disk setting on the target server.
Check the drive letters of the data and cluster partitions using Disk Management (Control Panel -> Administrative Tools -> Computer Management -> Disk Management). If the drive letter has been changed, re-configure it as it was, restart the server and check that the drive letter is configured correctly.
Restart the target server from the operation mode of Cluster WebUI.
If the drive letters are corrected in step 8 above, reconfigure the cluster information according to the procedures in steps 7 to 11 of "Replacing the mirror disk". In this case, read "server on which disks were replaced" as "target server".
Recover the target server to the cluster.
In automatic recovery mode, the server is automatically recovered to the cluster.
If Cluster Properties -> Mirror Disk tab -> Auto Mirror Recovery is set, the mirror is automatically reconstructed (partially copy or full copy). If the settings are configured not to perform Auto Mirror Recovery, reconstruct the mirror manually.
If mirror reconstruction ends abnormally, reconfigure the cluster information according to the procedures in steps 8 to 12 of "2.23. Replacing the mirror disk". In this case, read "server on which disks were replaced" as "target server".
Move the group as required.
2.29. Replacing FibreChannel HBA / SCSI / SAS controller¶
Follow the procedures below to replace HBA connecting the shared disk.
If the group is operating in the server where HBA is to be replaced (hereafter referred to as target server), move the group to another server.
Change the settings for the EXPRESSCLUSTER Server service of the target server to manual start.
Shut down the target server to replace HBA.
Start the target server with the disk cable disconnected.
From the config mode of Cluster WebUI, open the properties of the target server and configure filter settings on the replaced HBA.
Click Connect on the HBA tab to acquire the disk configuration data for the target server, and then select the replaced HBA.
Do not change the settings other than above.
Save the cluster configuration data in which HBA filter setting has been configured in Cluster WebUI temporarily in the disk area accessible from the cluster server.
If the Cluster WebUI is used on the cluster server, save the cluster configuration data in the local disk. Also, if the Cluster WebUI is used on another PC, save it in the shared folder accessible from the cluster server, or save it temporarily in an external media disk etc. and then copy it to the local disk of the cluster server.
Execute the following command on one of the cluster servers to upload the saved cluster configuration data.
clpcfctrl --push -x < path_of_the_cluster_configuration_information> --nocheck
Shut down the target server and connect the disk cable.
- Start the target server to check the drive letter in Disk Management.If the drive letter has been changed, set it as it was before. Restart the server to check that the drive letter is correctly configured.
From the config mode of Cluster WebUI, open the properties of the target server to check the settings for the HBA tab. If there is a partition which does not restrict access on the shared disk, check that the partition data is registered in Partition excluded from cluster management.
As with the steps 6 and 7 above, save the cluster configuration data temporarily and upload it from the cluster server with the following command:
clpcfctrl --push -x < path_of_the_cluster_configuration_information> --nocheck
If the message "There is difference between the disk information in the configuration information and the disk information in the server. Are you sure you want automatic modification?" appears upon saving the configuration information, select Yes.
Set the configuration for the EXPRESSCLUSTER Server service of the target server back to automatic start, and reboot the target server.
When Auto Recovery is configured as Off in Extension tab of Cluster Properties of the cluster, select Recover server the target server in the operation mode of Cluster WebUI.
Migrate the group if necessary.
2.30. Information required for inquiry¶
The following information is required for inquiring about the failure.
- FailureDescribe the failure.Example) Failover group (failover1) failed to fail over from server1 to server2.
- Time that the failure occurredExample) 2018/01/01 00:00
- Name of the server with the failureExample) server2
- Version of EXPRESSCLUSTERExample) EXPRESSCLUSTER X 4.2
- EXPRESSCLUSTER log and event log of when the failure occurredLogs can be collected by using the Cluster WebUI or by running the log collection command. To use Cluster WebUI, see the online manual. To use the log collection commands, see "Collecting logs (clplogcc command)" in "EXPRESSCLUSTER command reference" in the "Reference Guide".