This Guide is intended for administrators who want to build a cluster system, system engineers who want to provide user support, and maintenance personnel.
This guide introduces software whose operation in an EXPRESSCLUSTER environment has been verified.
The software and setup examples introduced here are for reference only. They are not meant to guarantee the operation of each software product.
The bundled scripts are for achieving failover.
Since these scripts are not designed to monitor all the SAP processes, check and (if necessary for their usage environments and their monitoring targets) customize their contents.
This guide is intended for system engineers and administrators who want to build, operate, and maintain a cluster system. Instructions for designing, installing, and configuring a cluster system with EXPRESSCLUSTER are covered in this guide.
This guide is intended for system administrators. The guide covers topics such as how to operate EXPRESSCLUSTER, function of each module and troubleshooting. The guide is supplement to the Installation and Configuration Guide.
This guide is intended for administrators and for system administrators who want to build, operate, and maintain EXPRESSCLUSTER-based cluster systems. The guide describes maintenance-related topics for EXPRESSCLUSTER.
A cluster with the following configuration can be built by combining SAP NW and EXPRESSCLUSTER.
2.1.1.1. SAP NW cluster configuration using EXPRESSCLUSTER
Set up the following components to EXPRESSCLUSTER as independent active-standby failover groups to perform failover from the active node to the standby node if a failure occurs in order to improve the availability of the SAP NW environment.
With ENSA2 used, Enqueue Replication Server Instance (hereinafter ERS) is also set as an Active-Standby failover group.
ABAP SAP Central Services Instance (hereafter, ASCS)
(With ENSA2 used) ERS
Set up the following components as failover groups for a single server configuration in which failover groups operate on each node.
(With ENSA used) ERS
Primary Application Server Instance (hereafter, PAS)
Additional Application Server Instance (hereafter, AAS)
saphostexec
The diagram below shows the configuration with ENSA used.
Fig. 2.1 SAP ABAP Platform Clustered System(for ENSA configuration)
The diagram below shows the configuration with ENSA2 used.
Fig. 2.2 SAP ABAP Platform Clustered System(for ENSA2 configuration)(1)
Fig. 2.3 SAP ABAP Platform Clustered System(for ENSA2 configuration)(2)
In addition to the monitoring function provided by EXPRESSCLUSTER, the SAP NW cluster system uses a monitoring package that supports the SAP system and an SAP NW-specific monitoring command to monitor the SAP NW components for response errors and hang-ups.
2.1.1.4. Illustration of linkage between SAP NW and EXPRESSCLUSTER
User requests to SAP NW are sent to EXPRESSCLUSTER via the Connector for SAP (clp_shi_connector). The EXPRESSCLUSTER cluster is operated by SAP NW.
2.1.1.5. Illustration of exclusive control of ASCS/ERS instance by EXPRESSCLUSTER
EXPRESSCLUSTER handles the exclusive control of the ASCS/ERS instance that is required for SAP NW as follows.
Exclusive in the figure below indicates a failover group for exclusive control.
Start both ASCS and ERS instances on different nodes. Start ERS instance on only one node. If ENSA is used, start the failover group for exclusive control on all nodes except the node which ERS instance starts.
EXPRESSCLUSTER handles failover process of ASCS instance as follows.
If ENSA is used, fail over the ASCS instance to the node where ERS instance was started before. If ENSA2 is used, fail over the ASCS instance to the node determined by the startup priority set in the failover group for ASCS.
If ENSA is used, ERS instance stops automatically by ASCS instance, after the failover of it is executed. If ENSA2 is used and the ERS instance has already been started on the failover target node of ASCS instance, the custom monitor resource of EXPRESSCLUSTER will execute the failover of ERS instance to another node.
The above mechanism of exclusive control of ASCS/ERS instances by EXPRESSCLUSTER works similarly in the case of more than 3 nodes.
2.1.1.6. Note on manual operation of the ERS instance
The ERS instance is used for the replication of the lock table from the ASCS instance. To ensure its redundancy the ERS instance must work on the node where the ASCS instance is not running. The ERS instance should not even manually be launched on the node where the ASCS instance is running. Additionally the ERS instance should not be launched on more than two nodes at same time.
The failover group of the ERS instance is not restarted automatically, when the node where the ERS instance was working recovers from a failure. After validating the health of the node a manual restart of the ERS instance failover group is required.
Since SAP NW can run on several database technologies, e.g. SAP HANA, SAP MaxDB, IBM DB2, Oracle, Microsoft SQLSERVER, this guide assumes there is already a high available database setup in place. If you need help how create an HA setup for your database scenario please follow related EXPRESSCLUSTER documents on https://www.nec.com/en/global/prod/expresscluster/.
Throughout this document the HA database setup will be referred to as "database".
Modification has been performed on the following minor versions.
EXPRESSCLUSTER Internal Version 11.3x
Version in which the problem has been solved
/ Version in which the problem occurred
Phenomenon
Level
Occurrence condition/
Occurrence frequency
11.35/
11.32/11.33/11.34
A failover group for the ERS instance of SAP NetWeaver may not link with a failover group for exclusive control in the same node.
* To deal with this problem, it is required to replace the script manually. The sample script can be obtained from the support portal (content ID: 9510100152).
M
Rarely occurs when starting up a failover group for ASCS instances.
EXPRESSCLUSTER Internal Version 12.0xor later
Version in which the problem has been solved
/ Version in which the problem occurred
Phenomenon
Level
Occurrence condition/
Occurrence frequency
12.10/
11.30, 12.00
When a failure is detected by the Custom monitor resource with the bundled scripts for SAP NetWeaver used, the SAP service is started while it is being stopped.
S
This problem occurs when stopping the SAP service takes time.
12.10/
11.30, 12.00
In the SAP NetWeaver configuration, starting up the ASCS service fails on the failover destination node when the first failover is performed for the ASCS failover group.
S
This problem occurs when the first failover is performed for the ASCS failover group in the AWS environment.
12.30/
12.10 to 12.22
If the user starts/stops a SAP instance immediately after enabling the maintenance mode, the following occurs:
The maintenance mode fails to be disabled.
Inconsistency in the startup status occurs between the resource of EXPRESSCLUSTER for the SAP instance and the actual SAP instance.
S
This problem occurs when the user starts/stops the SAP instance immediately after enabling the maintenance mode.
13.20/
12.20 to 13.12
In the maintenance mode-enabled state, executing Verify High Availability Config from SAP MMC results in a failed execution.
S
When the maintenance mode is enabled, this occurs when executing Verify High Availability Config.
This manual describes a configuration, where Node#1 is the active node, Node#2 is the standby node, and a shared disk is used to provide a shared file system.
When creating a cluster on a cloud environment such as AWS and Microsoft Azure, use the AWS virtual ip resources and Azure DNS resources instead of the Floating IP resources.
Note that name resolution must be possible for host names associated with the floating IP (or virtual IP) for ASCS instance and ERS instance.
If an older version of this product is already installed, back up the bundled scripts and the configuration file of the Connector for SAP.
> xcopy C:\Program Files\CLUSTERPRO\etc\clp_shi_connector.conf D:\backup
> xcopy C:\Windows\System32\drivers\etc\services D:\backup
> xcopy <Folder where sample script is expanded> D:\backup
After installing EXPRESSCLUSTER please install the Connector for SAP. Unzip the Connector for SAP media (clp_shi_connector.zip), and then copy the following files.
When ENSA is used, it is required to create a failover group for exclusive control of ASCS and ERS instances described below.
The name of the failover group that exclusively controls must consist of the failover group name common to all nodes and a number as shown below.
The number following the name must be set in the order of the nodes to which the ERS1 and ERS2 instances have been installed.
<Common failover group name><Number>
Setting examples in this guide:
Exclusive-Group1(Node#1)Exclusive-Group2(Node#2)
Note
The failover group name must not contain any spaces.
Note
If the name the failover group that exclusively controls the ASCS and ERS instances does not conform to the naming conventions, exclusive control cannot be normally performed.
3.1.6. Specifying dependencies between failover groups
Set up the dependency among the failover groups.
The following shows the dependency (startup order) among SAP NW instances.
Database
->
ASCS
->
ERS
PAS
AAS
Be sure to stop the instances in reverse order.
Note
As outlined in "2.1.4.HA Database for SAP NW" it is assumed there is a database available. This database is a prerequisite for the above dependencies and needs to be available initially. If this is not the case, then you cannot continue from here.
Note
It is not necessary to set up the dependency for hostexec.
For details about how to set up the dependency among failover groups in EXPRESSCLUSTER please refer to the following document:
A SAP NW instance number must be unique across the cluster nodes. If some SAP NW instances have duplicate numbers, starting and stopping of the SAP NW instances cannot be controlled correctly.
If an instance number is duplicated inside a node or between nodes, reinstall a SAP NW component and reassign an instance number in either of the nodes.
Before installing SAP NW be sure to start EXPRESSCLUSTER and activate the floating IP resources and CIFS resources on Node#1.
The location to save the SAP software logistics tool including the sapinst command described later depends on your environment and the installation media used (DVD-ROM or downloaded files). The sapinst command is a command used to install SAP NW.
3.2.2. Creating Operating System Users and Groups
Perform this work on Node#1 and Node#2.
Run sapinst.
>sapinst
In the Software Provisioning tool select Generic Options > <your database> > Preparations > Operating System Users and Groups to create the OS users and groups.
In this guide NEC is used as SID and DBSID, and only ABAP is selected for Software.
SID
DBSID
Based On As
NEC
NEC
ABAP
Add the account SAPService<SID> to the Administrators group of each cluster node.
3.2.3. Changing the ASCS Instance Host Name Registries
Run sapinst with specifying a host name associated with the floating IP of ASCS instance for the environment variable SAPINST_USE_HOSTNAME.
> sapinst SAPINST_USE_HOSTNAME=<ASCS_Hostname>
Note
Specify a host name associated with the floating IP of ASCS instance for <ASCS_Hostname>.
In the Software Provisioning tool select <SAP NW to be installed> > <your database> > Installation > Application Server ABAP > Distributed System > ASCS Instance to install the ASCS.
The SID (SAP System ID) and INO (instance number) for the ASCS specified during installation are used in 6.1.1 (ASCS).
Copy the C:\Windows\System32\drivers\etc\services file of Node#1 to Node#2.
This file includes the following port number definitions according to the parameters set at installation.
Both nodes of the cluster must share this file.
:saphostctrl1128/tcp# SAPHostControl over SOAP/HTTPsaphostctrl1128/udp# SAPHostControl over SOAP/HTTPsaphostctrls1129/tcp# SAPHostControl over SOAP/HTTPSsaphostctrls1129/udp# SAPHostControl over SOAP/HTTPSsapmsNEC3610/tcp# SAP System Message Server Portsapdp003200/tcp# SAP System Dispatcher Portsapdp013201/tcp# SAP System Dispatcher Portsapdp023202/tcp# SAP System Dispatcher Port:
Add the ASCS10 sapstartsrv definition to Node#2.
Add the sapstartsrv service to Node#2 by referring to the service definition of Node#1.
Use the following command to check the service definition.
3.2.6. Installation of ERS Instances (Node#1 and Node#2)
Perform this work on Node#1 and Node#2.
If ENSA is used, execute sapinst as follows:
>sapinst
In the Software Provisioning tool select <SAP NW to be installed> > <your database> > Installation > Additional SAP System Instances > ERS Instance to install the ERS.
If ENSA2 is used, execute sapinst with the specification of the host name associated with the floating IP for the ERS instance to the environment variable SAPINST_USE_HOSTNAME:
> SET SAPINST_USE_HOSTNAME=ERS_Hostname
> sapinst
Note
Enter the host name associated with the floating IP of ERS instance for ERS_Hostname.
3.2.7. Installation of Database Instance (Node#1)
Perform this work on Node#1.
>sapinst
In the Software Provisioning tool select <SAP NW to be installed> > <your database> > Installation > Application Server ABAP > Distributed System > Database Instance to install the database instance.
In the Software Provisioning tool select <SAP NW to be installed> > <your database> > Installation > Application Server ABAP > Distributed System > Primary Application Server Instance to install the PAS.
In the Software Provisioning tool select <SAP NW to be installed> > <your database> > Installation > Application Server ABAP > High-Availability System > Additional Application Server Instance to install the AAS.
To enable proper detection of failures in the ENSA (part of ASCS) and trigger switchover correctly, the following parameter in the ASCS instance profile needs to be changed:
The ASCS instance profile is placed in the following location.
Set up the script resource to control starting and stopping of each instance.
A script to control starting and stopping of various SAP instances is available.
To control starting and stopping of each SAP instance using this sample script, set up the script resource.
Since the sample scripts that control starting and stopping an instance use resource names as keys for control, so it is necessary to specify resource names appropriate to the control target.
Include the following string in the resource name:
instance_<SID>_<INO>
The words in <> indicate the following items:
SID: SAP System ID
INO: Instance number
Note
The resource name must not contain any spaces.
Note
If the resource name does not conform to the naming conventions, starting and stopping of SAP NW instances cannot be normally controlled.
Specify a resource name that conforms to the naming conventions for the EXEC resource that controls starting and stopping of SAP NW instances. If the resource name does not conform to the naming conventions, starting and stopping of SAP NW instances cannot be normally controlled.
This resource is automatically registered when a disk resource is added. The disk TUR monitor resource corresponding to each disk resource is automatically registered.
The disk TUR monitor resource has default values. If necessary change them to appropriate values.
Note
This resource cannot be used for a disk or disk interface (HBA) that does not support SCSI Test Unit Ready. Even if your hardware supports it, check the driver specifications because the driver may not support it.
This monitor resource is automatically registered when a CIFS resource is added. The CIFS monitor resource corresponding to each CIFS resource is automatically registered.
Note
When access check is performed, the specified access method must be permitted for the local system account in the CIFS resource to be monitored.
Add the following specification to the default profile for SAP instances (DEFAULT.PFL) and the instance profile for each SAP instance to activate the SAP HA Connector and combine it with EXPRESSCLUSTER.
A setting example in this manual is shown below.
The path and a setting example of the default profile
The SAP instance services need to be restarted after the setting is changed. Restart the cluster for instance.
3.4.2.2. Granting administrator permissions to SAP NW users
To make SAP HA Connector executable, give full control permissions to the following registry to SAP NW users (SAP_<SID>_GlobalAdmin).
HKEY_LOCAL_MACHINE\SOFTWARE\NEC\EXPRESSCLUSTER
Note
To combine SAP NW and EXPRESSCLUSTER, give full control permissions to the specified registry to the group that was automatically created when SAP NW was installed. If full control permissions to the registry are not given to SAP NW users, starting and stopping of SAP NW instances cannot be normally controlled.
For the Connector for SAP the following parameters in the configuration file can be changed to configure the log level, log size, and group resources for which to refuse the start/stop request from the SAP interface.
Note
Only one-byte characters can be used in the configuration file.
Set each setting item in the key=value format.
If a key which can be set only once is set more than once, the last setting value is effective.
Any lines beginning with a string other than the valid keys, and any blank lines are skipped.
Any spaces and tabs before and after the key/value are skipped.
The maximum length of one line is 1023 bytes.
If there is no configuration file or setting values are invalid, the default values are used.
Parameter name
Setting value
Description
LOGLEVEL
ERROR
WARNING
INFO
TRACE
(default: INFO)
Specify the output log level.
ERROR:
Output logs of the error level and information level.
WARNING:
Output logs of the warning level and information level.
INFO:
Output logs of the information level.
TRACE:
Output logs of the internal trace and information level.
LOGSIZE
1 to 2147483647
(default: 1000000)
Specify the log size in bytes.
If the size of the current log file exceeds the specified log size, log rotation is performed.
The number of log file generations is two (the current log and the log of the previous generation).
REFUSE_START_GROUP_RESOURCE
Group resource name in EXPRESSCLUSTER
(default: None)
As the setting value (group resource name in EXPRESSCLUSTER), specify the name of the script resource that controls a SAP instance for which to refuse the start request from the SAP interface.
To set more than one group resource name, set more than one REFUSE_START_GROUP_RESOURCE parameter.
REFUSE_STOP_GROUP_RESOURCE
Group resource name in EXPRESSCLUSTER
(default: None)
As the setting value (group resource name in EXPRESSCLUSTER), specify the name of the script resource that controls a SAP instance for which to refuse the stop request from the SAP interface.
To set more than one group resource name, set more than one REFUSE_STOP_GROUP_RESOURCE parameter.
GVI_CHECKCOUNT
1 - 60
(default: 30)
The number of retries EXPRESSCLUSTER will try to obtain product information when the cluster is started. The interval between these attempts is set by GVI_CHECK_INTERVAL as stated below.
Even if the count does not reach to the setting, obtaining product information finishes when one attempt succeeded.
GVI_CHECKINTERVAL
1 - 60
(default: 10)
The interval in seconds between EXPRESSCLUSTER attempts to obtain product information. If obtaining product information will be done only once (GVI_CHECKCOUNT=1), then this value will be ignored.
FRA_CHECKCOUNT
1 - 60
(default: 30)
The number of retries to check the status of the group resource when the Rolling Kernel Switch is done. The interval between the check is set by FRA_CHECK_INTERVAL as stated below.
Even if the count does not reach to the setting, the status check finishes when one attempt succeeded.
FRA_CHECKINTERVAL
1 - 60
(default: 10)
The interval in seconds between checks of the status of the group resource. If the status check will be done only once (FRA_CHECKCOUNT=1), then this value will be ignored.
SMM_PATH
Folder to store the files which the Connector for SAP uses for the maintenance mode (default: none)
(Internal Version 12.1x or later can be specified)
Specify this when using maintenance mode.
Set the folder to store the files which the Connector for SAP uses for the maintenance mode. Specify the folder on which each cluster node is allowed to write. Under the specified folder, do not manually create files or folders. Only use up to 240 single-bite characters.
Example:
SMM_PATH=\\<hostnameforASCSinstance>\sapmnt\sapmm
ENSA_VERSION
1, 2
(Default: 2)
(Internal Version 12.1x or later can be specified)
Specify the version of ENSA to use.
Make sure that the version of ENSA matches the setting on the SAP NW side.
Set to 1 with ENSA used.
Set to 2 with ENSA2 used.
YELLOW_AS_ERROR
0, 1
(Default: 0)
(Internal Version 12.1x or later can be specified)
The bundled scripts allow you to check the statuses of processes composing each instance with sapcontrol -function GetProcessList. To determine whether the result is abnormal or not, you can choose from the following two patterns:
- Judge the result as abnormal when not all the process statuses are GREEN.
- Judge the result as abnormal when not all the process statuses are GREEN or YELLOW.
Specify either of the patterns with this parameter.
If a different pattern is to be set for a specific process, customize the bundled script (e.g. for judging the result as normal when Process A is YELLOW, for judging as abnormal when Process B is YELLOW).
Set the value to 1 to judge the result as abnormal when not all the process statuses are GREEN. In this case, YELLOW leads to judging the result as abnormal.
Set the value to 0 to judge the result as normal even if any of the process statuses is YELLOW.
Note
Obtaining product information of EXPRESSCLUSTER when the cluster is started may fail due to a timing issue, as well as checking the group resource status when performing a Rolling Kernel Switch. In such a case adjust timing values and repetition count for GVI and FRA parameters.
A configuration example is shown below. This example sets the log level to INFO and the log size to 1000000 bytes.
To update SAP NW, use Software Update Manager (hereafter referred to as SUM). The update procedure with SUM involves restarting SAP instances and therefore it may interfere with EXPRESSCLUSTER which tries to keep the SAP components available. To avoid such interference with EXPRESSCLUSTER, suspend EXPRESSCLUSTER's monitoring for all SAP components that SUM has to restart.
Please select from the following two options to suspend EXPRESSCLUSTER's monitoring.
Suspending the whole cluster
Suspending monitor resources related to SAP instances and instance services
Update SAP NW with SUM while the cluster or the monitor resources are suspended. After the update is completed, resume the suspended cluster or the suspended monitor resources.
For how to suspend and resume a cluster or a monitor resource, please refer to the following document.
The maintenance mode can be switched on/off by the sapcontrol command or from the SAP management console. For details on the maintenance mode and the sapcontrol command, see the SAP documents.
When the maintenance mode is enabled, the cluster is suspended from the Connector for SAP.
When the maintenance mode is disabled, the cluster is resumed from the Connector for SAP.
When the maintenance mode is used, avoid suspending or resuming the cluster from Cluster WebUI or with the clpcl command in order to avoid conflicts.
The following is an example to enable the maintenance mode with the sapcontrol command:
A node name , a failover group and a resource name must not contain any spaces. If they contain some spaces, starting and stopping of SAP NW instances cannot be controlled correctly.
Naming conventions for failover groups
Specify a failover group name according to the naming conventions for the failover group for exclusive control of ASCS/ERS instance. If the failover group name does not follow the naming conventions, exclusive control of ASCS/ERS instance cannot function correctly.
Naming conventions for script resources
Specify a resource name according to the naming conventions for the EXEC resource that controls starting and stopping of SAP NW instances. If the resource name does not follow the naming conventions, starting and stopping of SAP NW instances cannot be normally controlled.
SAP NW instance number
A SAP NW instance number must be unique across the cluster nodes. If some SAP NW instances have duplicate numbers, starting and stopping of the SAP NW instances cannot be controlled correctly.
If an instance number is duplicated inside a node or between nodes, reinstall a SAP NW component and reassign an instance number in either of the nodes.
Note on manual operation of ERS instance
The ERS instance replicates the lock table from the ASCS instance. The ERS instance must work on the node where the ASCS instance is not performing, to ensure its redundancy.
ERS instance should not be launched on the node where ASCS instance is performing even if it is manual operation.
Additionally the ERS instance should not be launched on more than two nodes at same time.
Attention when one node recovers
When the node where ERS instance was working gets recovered and joins the cluster, then the failover group of the ERS instance is not restarted automatically.
You need to validate the node is working healthy and then restart the failover group of ERS instance manually.
When the node where the ASCS instance was previously running on crashed and recovers, it may display that it is still sharing the <shareddisk>:\sapmnt folder. However this share is locally inaccessible since it is running on the node, which is currently running the ASCS failover group. In this case it is necessary to manually delete this inaccessible share on the recovered node, otherwise a failback of the ASCS group to this node will fail.
Restriction of using virtual computer name resource
A virtual computer name resource must not be used with this product. If a virtual computer name resource is used, starting and stopping of SAP NW instances cannot be controlled correctly.
When the disk resource for ASCS is failed over or stopped, the following warning message may be output to Cluster WebUI. Ignore this message.
The mirror disk or shared disk belonging to the failover group for ASCS also stops, if the failover group for ASCS is stopped while PAS instance or AAS instance is running (e.g. manual failover). As a result, startup of ASCS instance service or ASCS instance on a failover target server may fail.
When creating a cluster on a cloud environment such as AWS and Microsoft Azure, use the AWS virtual ip resources and Azure DNS resources instead of the Floating IP resources. Note that name resolution must be possible for host names associated with virtual IPs for ASCS instances by the AWS virtual ip resource.
When a shared folder is published by CIFS resource in AWS environment, startup of ASCS instance service on the failover target node may fail.
This is due to the fact that a process which accesses <\\Virtual Hostname\Shared Name> during the time period for failover exists and that an access attempt to file share fails when newly connecting to <\\Virtual Hostname\Shared Name> after failover.
This phenomenon occurs due to the specification of Windows OS.
In this case, it may be avoided by either of the configuration changes in CIFS resources as listed below:
Turn off the "Cache Enable" in CIFS resource
Change the Cache Setting into "Automatic cache" in CIFS resource
Maintenance mode
When the maintenance mode is used, avoid suspending or resuming the cluster from Cluster WebUI or with the clpcl command in order to avoid conflicts.
For the "SMM_PATH" parameter in clp_shi_connector.conf, specify the directory on which each cluster node is allowed to write. Under the specified directory, do not manually create files or directories.
The minimum SAP kernel patch level
You must use the following SAP kernel patches level which include fixes for known issues.
For details of SAP kernel patches please refer to SAP Note 1693245.