Friday, 4 May 2018

HANA Spatial Demos: Geocoding, Clustering, Aggregation

Business story


Let’s imagine a purely fictional story to back-up our applications:

◈ Our customer is an insurance company specialised in house insurance.
◈ Currently, they have all their data in Excel files.
◈ They want to move it into a dedicated system capable of grouping houses in the same area together.

Looking back, this scenario might not be that far from reality after all.

Objectives


To fulfil our customer’s needs, we will do the following steps:

◈ Take the data from the Excel file (.csv) and import it in HANA. This data contains:
     ◈ The insurance policy number (unique business key for a policy).
     ◈ The policy holder name.
     ◈ The insured object’s address.
     ◈ And the total sum insured.
◈ Geocode the house addresses to get their coordinates.
◈ Build apps for visualising, aggregating and clustering the houses.

Technical setup


We will do all of this on a HANA Trial MDC, which we create following the steps described in the “Setup your trial SAP HANA MDC instance” SAP developer tutorial. The development itself can be done directly in the Web-based Workbench.

The Google Geocoding API is a fairly good tool for obtaining coordinates from textual addresses. To use it, we also need an API Key for it. In addition, HANA needs to “trust” the SSL certificate of Google, so we need to create a new trust store for it.

Data import


First we need to create a CDS entity for storing the input data:

    // the main table, should store all the necessary data
    entity eInsurance {
        key policyNumber:       String(32);
            personName:         String(128) not null;
            objectAddress:      String(256) not null;
            insuredSum:         Decimal(16,2) not null;
            objectLocation:     hana.ST_POINT(4326);
            objectGeocoded:     hana.TINYINT default 0;
    };

Then we can use a table import file to move the data from the CSV to the table itself:

import = [
    {
        cdstable  = "workspace.master.data::ctxSpatial.eInsurance";
        schema = "WORKSPACE_MASTER_SPATIAL";
        file = "workspace.master.data:eHouses.csv";
        header = false;
        delimField = ",";
        delimEnclosing="\"";
    }
];

Geocoding


Normally, we could use HANA’s built-in functionality: the GEOCODE INDEX with a user-defined geocoding provider. Back when I wrote these apps, on an MDC I was not able to actually change the system configuration to use my own provider, so I decided to just call my function through an XS Job:

{
    "description": "Perform geocoding",
    "action": "workspace.master.service:job.xsjs::batch",
    "schedules": [{
        "xscron": "* * * * * * */5"
    }]
}

Roughly speaking, the following code is responsible for calling the Google API:

/**
 * Calls the google geocoding API.
 * @param   {string}    sApiKey     The Google Maps API key.
 * @param   {string}    sAddress    The address which should be geocoded.
 * @returns {object|null}   The response body returned by the service.
 * Null if the request failed.
 */
function callGeocodingApi(sApiKey, sAddress) {
    try {
        var sUrl = "/geocode/json?address=" + encodeURIComponent(sAddress) + "&key=" + sApiKey,
            oDestination = $.net.http.readDestination("workspace.master.service", "googleApi"),
            oClient = new $.net.http.Client(),
            oRequest = new $.net.http.Request($.net.http.GET, sUrl),
            oResponse = oClient.request(oRequest, oDestination).getResponse(),
            oData = JSON.parse(oResponse.body.asString());
        return oData;
    }
    catch (e) {
        $.trace.error(e);
        return null;
    }
}

To track which addresses were already processed, I also added a objectGeocoded flag to the CDS entity.

Visualisation


To view the points on a simple Google Map, we need to decompose the ST_POINT objects into the latitude and longitude components and expose them through an OData service.

For this we can simply create a simple CDS view:

    // the expanded view (expands the objectLocation into coords)
    // contains only the points which were geocoded succesfully
    view vExpandedInsurance as select from eInsurance{
        policyNumber,
        personName,
        insuredSum,
        objectAddress,
        objectLocation.ST_Y() as objectLatitude,
        objectLocation.ST_X() as objectLongitude
    } where eInsurance.objectGeocoded = 1;

And then add it to the xsodata service:

"workspace.master.data::ctxSpatial.vExpandedInsurance" as "ExpandedInsurances"
   keys ("policyNumber")
   create forbidden
   update forbidden
   delete forbidden;

Using a simple UI5 app, we get a fairly nice view of the points:

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio

As a bonus, if we click on a point, we get more details about the insurance policy behind it.

Data acquisition


We don’t want our customer to always use the CSV import. It should just be there for the initial “migration” of the data.

When new policies are created, we would like to help our customer in getting decent-quality data, so we build a small UI which shows exactly where an address is when creating the policy:

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio

Technically, this UI just reuses the existing geocoding functionality from before.

Clustering


One very interesting use case would be to cluster up the insured objects. Think about natural disasters; they have the potential of affecting a large number of houses located in relative close proximity. For an insurance company, having a large lump of risk in the same area can easily spell disaster.

Because we are not sure how the customer would like to have it, we build three different prototypes supporting this.

UI-based clustering


The first one is simply using Google’s Marker Clustering functionality. We don’t really like it though, because it will surely not scale and it doesn’t really use anything from HANA. Also, we don’t get to know exactly what’s the total amount of sum insured in each cluster (just the number of houses).

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio

Static clustering


The second one assumes that the user does not need to input the number of clusters, but we can decide on it. Truth be told, this variant only exists because, back when I created the applications, using a calculation view input parameter in place of the hard-coded cluster count would result in an error.

We write a fairly simple calculation view for this and then expose it through OData:

var_out = SELECT ST_ClusterID() AS "id", SUM("insuredSum") AS "totalSum", 
COUNT("insuredSum") AS "objectCount", ST_ClusterCentroid().ST_Y() AS "centerLatitude",
ST_ClusterCentroid().ST_X() AS "centerLongitude"
FROM (
SELECT "insuredSum", "objectLocation".ST_Transform(1000004326) AS "objectLocation" 
FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance" 
WHERE "objectLocation" IS NOT NULL
)
GROUP CLUSTER BY "objectLocation"
USING KMEANS CLUSTERS 5;

Dynamic clustering


We don’t really like this either, because we hard-coded the cluster count. To work around the calculation view activation error, we move the code from the view into a simple XSJS script (with basically the same content, but with the cluster count parameterized). In the end, we obtain something that looks like this:

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio

Aggregation


The customer might also want to select manually how to group the points together. To support this kind of scenario, we also create two different prototypes. For both prototypes, we want to:

◈ Split all the points into two groups based on some kind of user input.
◈ Display which points are in each group.
◈ Aggregate the total sum insured per group.

Radius-based aggregation


In this prototype, we split the points into two groups based on their distance to a user selected point. If they are inside a circle of given radius and center, then they are in the “inside” group, otherwise they are “outside”.

For determining this relationship, we use the following query that makes use of the ST_DISTANCE method:

 var_out = SELECT "policyNumber", 
"objectLocation".ST_Y() AS "objectLatitude", 
"objectLocation".ST_X() AS "objectLongitude", 
CASE WHEN "objectLocation".ST_DISTANCE(NEW ST_POINT(
    'POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') < :iv_radius 
THEN 1 ELSE 0 END AS "group"
FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"
WHERE "objectLocation" IS NOT NULL;

Similarly, we build a view for the aggregated values. Here we can take into consideration the fact that we will always have two groups (“inside” and “outside”):

var_out = SELECT 1 AS "group", IFNULL(SUM("insuredSum"), 0) AS "totalSum", COUNT("policyNumber") AS "objectCount"
FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"
WHERE "objectLocation" IS NOT NULL AND "objectLocation".ST_DISTANCE(
NEW ST_POINT('POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') < :iv_radius
UNION ALL
SELECT 0 AS "group", IFNULL(SUM("insuredSum"), 0) AS "totalSum", COUNT("policyNumber") AS "objectCount"
FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance" 
WHERE "objectLocation" IS NOT NULL AND "objectLocation".ST_DISTANCE(
NEW ST_POINT('POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') >= :iv_radius; 

Then we just display these two sets of data on a simple “picker” UI:

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio

Region-based aggregation


The second version is to actually let the user draw the region as a polygon. Thankfully, we can use the built-in ST_CoveredBy predicate to work this out:

var_out = SELECT "policyNumber", "objectLocation".ST_Y() AS "objectLatitude", 
"objectLocation".ST_X() AS "objectLongitude",
"objectLocation".ST_CoveredBy(NEW ST_POLYGON(:iv_polygon, 4326)) AS "group"
FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"
WHERE "objectLocation" IS NOT NULL;

The polygon itself is an input variable to this calculation view. The map includes a set of drawing tools which also support polygons. Using this functionality we can build a simple UI for this prototype as well:

SAP HANA Tutorials and Materials, SAP HANA Guides, SAP HANA Learning, SAP HANA Certifications, SAP HANA Studio