Wednesday, 10 January 2018

Your SAP on Azure – Part 4 – High Availability for SAP HANA using System Replication

In the today’s post, I would like to present you a solution for protecting the HANA database server.

SAP HANA database offers two solutions that are designed for High Availability:

a) Host Auto-failover – in this solution you need to deploy additional host to the current HANA database and configure it to work in standby mode. In case the active node failures, the standby host can automatically switch operations to the secondary node. This solution requires a shared storage, which we already know is a small problem for Azure

b) System replication – in this solution you need to install separate HANA system and configure replication for data changes. By default, the system replication doesn’t support High Availability as HANA database doesn’t support automatic failover. But you can use the features of SUSE Linux to enhance the base solution!
SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

There are different types of templates, designed for different purposes. In this blog, we will make the use of sap-3-tier-marketplace-image-multi-sid-db, which creates components required only by the database. As an alternative, you can use sap-3-tier-marketplace-image-converged which will deploys the entire environment (DB + ASCS + APP Server) in one step.

I prefer to work with Visual Studio, but you can deploy the template right from your browser or with the use of PowerShell.

VM PROVISIONING

In the Visual Studio open new Azure Resource Group project.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

Now it’s just enough to click Deploy and fill the required parameters:

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

A few moments later I have received a nice message saying the deployment went fine and no errors were reported:

Successfully deployed template 'azuredeploy.json' to resource group 'HANA_HA'.

Let’s have a look how does it look in the Azure portal.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

As you can see on above screenshot two VMs have been successfully deployed. The chosen template supports highly available scenarios, so both VMs were placed into Availability Set. The Load Balancer is initialized with backend pool and load balancing rules, so after having a quick look we can start building the solution.

HIGH AVAILABILITY CLUSTER IN SUSE LINUX ENTERPRISE SERVER

When our VMs are ready we log in and start configuration of the cluster. Firstly, we need to download additional packages for both servers:

◈ sle-ha-release
◈ fence-agents
◈ SAPHanaSR

You can do it from the command line or with the use of YaST.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

We initialize the cluster on the hha-db-0 host with the command:

ha-cluster-init

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

On the second host we execute the following command to add a host to the cluster:

ha-cluster-join

Next step is to modify the corosync configuration to define the two nodes of the cluster. This step has to be executed on both servers:

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

To enable new settings, we need to restart the corosync service.

SAP HANA INSTALLATION

We can progress now with SAP HANA installation on two hosts.

The ARM template we have used creates a data disk that is attached to our VM. Its size depends on the SAP System Size we have selected during the deployment.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

After the new partition is created we need to download HANA packages and start the install:

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

If you’re using the ARM templates, please use 03 as the instance number. Otherwise, you need to manually modify the load balancer rules in Azure.

During the HANA deployment, we have some time to start the database installation on the second node.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

I want to use my HANA database together with SAP Netweaver, so I quickly provision new virtual machine and install SAP.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

Once the SAP Netweaver is installed we need to perform a full system backup HANA databases. This is required to enable system replication.

The setup of system replication is really easy and can be done with few mouse clicks in SAP HANA Studio. Select the first node and choose Configure System Replication from the context menu.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

I recommend reading the full list of important points to consider when working with HANA System Replication and Multitenant Database Containers. For me, the most important fact is that we can enable replication for entire database only. It is not possible to enable it only for particular tenants. The state of each tenant is also synchronized, which means that the ones that are online on the primary node are also online on the secondary node. The same applies to the stopped tenants – they keep the same state on both hosts.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

When replication is enabled on the primary node, we could start the configuration of the secondary node, but before we proceed we need to ensure that the SSFS PKI is the same on the primary and secondary node. Log in through SSH and copy the SSFS_<SID>.DAT and SSFS_<SID>.KEY files.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

Registering a system as a secondary node can be done only when the database is offline.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

Choose Register Secondary System from the Configuration and Monitoring:

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

After few minutes, the operation is complete. You can monitor the replication status in SAP HANA Studio. You can say the systems are in sync only if replication status is active for all volumes.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

The secondary system appears as operational, but you won’t be able to connect to it (you will receive information that Database Connection is not available). This is a correct behavior.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

The HANA System Replication is done, so let’s go back to SLES Cluster config. In the next step, we will configure the basic cluster config by importing default values. You can decide what action should be performed when the node stops responding (stonith-action). In our case, the VM will be deallocated.

Create new file on the first node called crm-defaults.txt and enter following configuration:

property $id="cib-bootstrap-options" \
  no-quorum-policy="ignore" \
  stonith-enabled="true" \
  stonith-action="off" \
  stonith-timeout="150s"
rsc_defaults $id="rsc-options" \
  resource-stickiness="1000" \
  migration-threshold="5000"
op_defaults $id="op-options" \
  timeout="600"

(source: microsoft.com)

Now import the new configuration with the following command:

sudo crm configure load update crm-defaults.txt

The defined STONITH device will stop the system in case of failure. Therefore we need to authorize it to perform operations in the Azure subscription.

Go to Azure portal and add a new application in the Azure Active Directory:

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

The name and Sign-On URL are not important, just choose Web app / API as Application Type. Now, select the new app and choose Keys in the menu. Create a new entry with chosen name and select Never Expire in the second column. Remember to copy the Value after saving.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

The chosen role should be Owner to allow the application to start and stop VM.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

Execute this step for both VMs.

Following script configure the fencing mechanism. Please replace the bold strings with proper values from the table below.

Name in the file Name in Azure  Where to get? 
Subscription ID Subscription ID  Subscription blade 
Resource Group  Resource Group  Virtual Machine blade 
Tenant ID  Directory ID  Azure Active Directory blade -> Properties 
Login ID  Application ID  Azure Active Directory blade -> App Registration 
Password  Key Value  Can be retrieved only during key creation 

primitive rsc_st_azure_1 stonith:fence_azure_arm \
    params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="login ID" passwd="password"

primitive rsc_st_azure_2 stonith:fence_azure_arm \
    params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="login ID" passwd="password"

colocation col_st_azure -2000: rsc_st_azure_1:Started rsc_st_azure_2:Started

(source: microsoft.com)

Load the configuration with the following command:

sudo crm configure load update crm-fencing.txt

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

It is required to execute two more scripts delivered by Microsoft to create SAP HANA resources:

a) crm-saphanatop.txt

SAP HANA Topology is a resource agent that monitors and analyze the HANA landscape and communicate the status between two nodes. The description of each parameter used can be checked by running man ocf_suse_SAPHanaTopology command.

primitive rsc_SAPHanaTopology_HHA_HDB03 ocf:suse:SAPHanaTopology \
    operations $id="rsc_sap2_HHA_HDB03-operations" \
    op monitor interval="10" timeout="600" \
    op start interval="0" timeout="600" \
    op stop interval="0" timeout="300" \
    params SID="HHA" InstanceNumber="03"

clone cln_SAPHanaTopology_HHA_HDB03 rsc_SAPHanaTopology_HHA_HDB03 \
    meta is-managed="true" clone-node-max="1" target-role="Started" interleave="true"
(source: microsoft.com)

b) crm-saphana.txt

This file defines the resources in the cluster together with the Virtual IP which is assigned to the Azure Load Balancer. You need to adjust the system id and number.

primitive rsc_SAPHana_HHA_HDB03 ocf:suse:SAPHana \
    operations $id="rsc_sap_HHA_HDB03-operations" \
    op start interval="0" timeout="3600" \
    op stop interval="0" timeout="3600" \
    op promote interval="0" timeout="3600" \
    op monitor interval="60" role="Master" timeout="700" \
    op monitor interval="61" role="Slave" timeout="700" \
    params SID="HHA" InstanceNumber="03" PREFER_SITE_TAKEOVER="true" \
    DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false"

ms msl_SAPHana_HHA_HDB03 rsc_SAPHana_HHA_HDB03 \
    meta is-managed="true" notify="true" clone-max="2" clone-node-max="1" \
    target-role="Started" interleave="true"

primitive rsc_ip_HHA_HDB03 ocf:heartbeat:IPaddr2 \ 
    meta target-role="Started" is-managed="true" \ 
    operations $id="rsc_ip_HHA_HDB03-operations" \ 
    op monitor interval="10s" timeout="20s" \ 
    params ip="10.0.0.4" 
primitive rsc_nc_HHA_HDB03 anything \ 
    params binfile="/usr/bin/nc" cmdline_options="-l -k 62503" \ 
    op monitor timeout=20s interval=10 depth=0 
group g_ip_HHA_HDB03 rsc_ip_HHA_HDB03 rsc_nc_HHA_HDB03

colocation col_saphana_ip_HHA_HDB03 2000: g_ip_HHA_HDB03:Started \ 
    msl_SAPHana_HHA_HDB03:Master  
order ord_SAPHana_HHA_HDB03 2000: cln_SAPHanaTopology_HHA_HDB03 \ 
    msl_SAPHana_HHA_HDB03
(source: microsoft.com)

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

MONITORING

There are various tools that assist us with cluster monitoring.

crm_mon

This tool shows us information about SLES Cluster, including resources and status of each node.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

SAPHanaSR-showAttr

Displays information about the current status of SAP HANA System Replication. We are interested in sync_state column. When the replication is working fine the values should be PRIM for the primary node and SOK for the secondary.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

SAP HANA Studio

General information about the system replication status. We need to ensure the replication status is ACTIVE for all volumes.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

TESTING

It’s time to verify our solutions. In a production environment, a proper testing of the HA solution is crucial. For the purpose of this blog, we will simulation a lost connectivity.

Expected results:

1. The HANA operations are automatically switched to the secondary node
2. The first node will shut down
3. The SAP Netweaver will continue to work

Actual results:

1. The takeover took place and the operations were continued on the secondary node

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

2. The primary node is stopped and deallocated.

SAP HANA Guides, SAP HANA Certifications, SAP HANA Learning, SAP HANA Tutorials and Materials

3. I don’t have any good idea how to show you that the Netweaver was still running, so you have to believe me. There was a few seconds delay in operations, but it was continued without any problems!
Thanks for reading my blog! I hope you didn’t run into any issues while configuring the SAP HANA System Replication with automatic failover. See you in a short time – next blog will describe how to create your backup environment in the Microsoft Azure.