Monday, 29 October 2018

Solving the Knight’s Tour Problem with HANA Graph

The problem is best summarised as the following: “find a succession of moves of a knight piece on a chess board such that the knight visits each square exactly once”.
SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

Graph Theory



We can model the chess board like a graph with the squares as the vertices and all the possible moves as the edges.

By using this abstraction, the problem actually boils down to a special case of the more general Hamiltonian Path problem, which is basically the search of a path traversing all vertices of a graph exactly once.


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

Graph Engine



HANA has built-in support for processing graphs. As of now, it has the following main features:

◈ Storing and representing graphs (basically you need a vertex and an edge table and then you can create a so-called “workspace” out of them).
◈ Running some simple algorithms on graphs (neighbourhood, shortest path, connected components, etc).
◈ Performing queries based on the openCypher query language.
◈ Writing stored procedures using a dedicated GraphScript language.

To solve this problem, I used the stored procedures together with the graph representation.

Chess Board


The first step would be to generate the chess board as a graph. We will label the rows from 1 to 8 and the columns from A to H:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

To store the graph, I defined the data model as follows:


CREATE TABLE nodes (
label VARCHAR(2),
row_number INT NOT NULL,
col_number INT NOT NULL,
PRIMARY KEY (label)
);

CREATE TABLE edges(
label VARCHAR(5) NOT NULL,
from_node VARCHAR(2) NOT NULL,
to_node VARCHAR(2) NOT NULL,
PRIMARY KEY (label)
);

CREATE GRAPH WORKSPACE board
  EDGE TABLE edges SOURCE COLUMN from_node 
    TARGET COLUMN to_node KEY COLUMN label
  VERTEX TABLE nodes KEY COLUMN label;

I did not want to manually define the board, so I wrote a procedure for generating it for me based on an input size:

CREATE OR REPLACE PROCEDURE generate_board_for_knights(
    IN size INT, 
    OUT nodes TABLE(
        label VARCHAR(2),
        row_number INT,
        col_number INT
    ),
    OUT edges TABLE(
        label VARCHAR(5),
        from_node VARCHAR(2),
        to_node VARCHAR(2)
    )
) LANGUAGE SQLSCRIPT READS SQL DATA AS BEGIN
    nodes = SELECT col_label || row_label AS label, 
                row_number, col_number
        FROM (
           -- generate row numbers from 1 to 8
           SELECT GENERATED_PERIOD_START AS row_number, 
              TO_VARCHAR(GENERATED_PERIOD_START) AS row_label 
              FROM SERIES_GENERATE_INTEGER(1, 1, 9)
        ) AS row_table
        CROSS JOIN (
           -- generate column numbers from 1 to 8
           -- and column labels from A to H
           SELECT GENERATED_PERIOD_START AS col_number, 
              NCHAR(ASCII('A') + GENERATED_PERIOD_START - 1) AS col_label 
              FROM SERIES_GENERATE_INTEGER(1, 1, 9)
        ) AS col_table; 
           -- the edge label is simply the node labels
           -- joined by a hyphen (e.g. A1-C2)
    edges = SELECT A.label || '-' || B.label AS label, 
              A.label AS from_node, B.label AS to_node 
              FROM :nodes AS A CROSS JOIN :nodes AS B
            WHERE  (ABS(A.row_number - B.row_number) = 2 
              AND ABS(A.col_number - B.col_number) = 1)
              OR (ABS(A.row_number - B.row_number) = 1 
              AND ABS(A.col_number - B.col_number) = 2);
END;

After creating the procedure, I was able to simply call it to fill my tables:

CALL "GENERATE_BOARD_FOR_KNIGHTS"(8, nodes, edges) WITH OVERVIEW;

WebIDE has a built-in graph viewer which provides a pretty nice visualisation of the chess board:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

Lastly, I defined a table type for representing paths:


CREATE TYPE tt_path AS TABLE(label VARCHAR(2));

Naive Approach


The first idea that came to my mind was to try to solve it via backtracking. I decided to not do it, because of the fact that it would be simply too slow.

Vanilla backtracking has an exponential asymptotic time complexity. In our graph each node has potentially 8 neighbours and we have to find a 64-node long path, so it would need to do (slightly less than) 8 ^ 64 steps to finish the whole algorithm.

Of course, if we would stop when we find the first solution it will be faster, but I still decided that there must be a better solution.

Greedy Algorithm


After thinking on it a while, I decided that it should be possible to find a solution using a greedy algorithm with an appropriate selection heuristic.

The first heuristic that I tried was to always try to go towards the bottom of the board and slowly fill in from there. After fighting a little with the GraphScript language, I wrote the following procedure:

CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH" (
  IN iv_origin VARCHAR(2),
  OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
  Graph lo_graph = Graph("BOARD");
  BigInt lv_size = Count(Vertices(:lo_graph));

  -- retrieve the start point from the graph based on the origin label
  Vertex lo_current = Vertex(:lo_graph, :iv_origin);
  Sequence<Vertex> lt_nodes = [:lo_current];
  ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
  
  WHILE (COUNT(:lt_nodes) <= :lv_size) {
    lo_current."VISITED" = true;
    Boolean lf_found = false;
    Int lv_row = -1;
    Int lv_col = -1;
    
    FOREACH lo_neighbor IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
      -- first try to go on lower rows
      -- if the row is the same, go towards the left corner
      IF (NOT :lo_neighbor."VISITED" 
          AND (:lv_row < :lo_neighbor."ROW_NUMBER" 
          OR (:lv_row == :lo_neighbor."ROW_NUMBER" 
          AND :lv_col < :lo_neighbor."COL_NUMBER"))) {
        lv_row = :lo_neighbor."ROW_NUMBER";
        lv_col = :lo_neighbor."COL_NUMBER";
        lf_found = true;
        lo_current = :lo_neighbor;
      }
    }
    
    -- stop the loop if we did not find a next node 
    -- this avoids infinite loops which do not do anything
    IF (NOT :lf_found) {
      BREAK;
    }
    
    -- append the next node to the path
    lt_nodes = :lt_nodes || [:lo_current];
  }
  
  ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;

Unfortunately, this approach failed miserably, yielding mediocre results:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

The problem looked to me like once the bottom half of the board was almost full, the knight got into a dead end. To prevent this, I tried applying a different heuristic: to always go as close as possible to the center of the board.


CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH" (
  IN iv_origin VARCHAR(2), 
  OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS 
BEGIN
  Graph lo_graph = Graph("BOARD");

  -- get the size of the graph and use it throughout the algorithm
  BigInt lv_size = Count(Vertices(:lo_graph));

  -- retrieve the start point from the graph based on the origin label
  Vertex lo_current = Vertex(:lo_graph, :iv_origin);
  Sequence<Vertex> lt_nodes = [:lo_current];
  ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
  
  WHILE (COUNT(:lt_nodes) <= :lv_size) {
    lo_current."VISITED" = true;
    Boolean lf_found = false;
    BigInt lv_score = 4L * :lv_size;
    
    FOREACH lo_neighbor IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
      IF (NOT :lo_neighbor."VISITED") {
        -- compute the delta (4.5 would be the middle of the board, but we
        -- multiply everything by 2 to be able to use integers)
        BigInt lv_row_delta = 2L * BIGINT(:lo_neighbor."ROW_NUMBER") - 9L;

        -- do an ABS, because GraphScript does not support math functions
        IF (:lv_row_delta < 0L) {
          lv_row_delta = :lv_row_delta * -1L;
        }

        BigInt lv_col_delta = 2L * BIGINT(:lo_neighbor."COL_NUMBER") - 9L;
        IF (:lv_col_delta < 0L) {
          lv_col_delta = :lv_col_delta * -1L;
        }

        IF (:lv_score > :lv_col_delta + :lv_row_delta) {
          lv_score = :lv_col_delta + :lv_row_delta;
          lo_current = :lo_neighbor;
          lf_found = true;
        }
      }
    }
    
    IF (NOT :lf_found) {
      BREAK;
    }
    
    -- append the next node to the path
    lt_nodes = :lt_nodes || [:lo_current];
  }
  
  ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;

I envisioned that the knight will spin around the center of the board. What I did not realise is that this approach would basically cut off the corners of the board leading the knight to be stuck in one of the corners:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

Seeing that my heuristics are not working, I decided to read a little literature on the topic.


Warnsdorff Rule



After reading some papers, I found out that I was very close to a correct solution. The heuristic that I was searching for was to always select the next node to be the one with the smallest degree. That is, we select the node which has the smallest amount of not-yet-visited neighbours.

CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF" (
  IN iv_origin VARCHAR(2), 
  OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS 
BEGIN
  Graph lo_graph = Graph("BOARD");

  -- get the size of the graph and use it throughout the algorithm
  BigInt lv_size = Count(Vertices(:lo_graph));

  -- retrieve the start point from the graph based on the origin label
  Vertex lo_current = Vertex(:lo_graph, :iv_origin);

  Sequence<Vertex> lt_nodes = [:lo_current];
  ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
  
  WHILE (COUNT(:lt_nodes) <= :lv_size) {
    lo_current.visited = true;
    Boolean lf_found = false;
    BigInt lv_min = :lv_size;
    
    FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
      IF (NOT :lo_first.visited) {
        BigInt lv_first_count = 0L;

        -- compute the degree of each neighbour
        FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
          IF (NOT :lo_second.visited AND :lo_second != :lo_current) {
            lv_first_count = :lv_first_count + 1L;
          }
        }
        
        -- go for the minimal degree
        IF (:lv_first_count < :lv_min) {
          lo_current = :lo_first;
          lv_min = :lv_first_count;
          lf_found = true;
        }
      }
    }
    
    IF (NOT :lf_found) {
      BREAK;
    }
    
    -- append the next node to the path
    lt_nodes = :lt_nodes || [:lo_current];
  }
  
  ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;

Basically, this would yield the exact opposite result as the image above: first visit the margins of the board, then the center:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

I finally got a solution! But what I don’t like at this heuristic is that the, when we have to decide on the next node and there are two nodes with the exact same score, we always take the first one. In the literature, they usually take one at random. Sometimes this heuristic may not be able to yield a result (although I never encountered this situation), so I wanted to improve it further.


Pohl’s Tie Breaking



One possible way of breaking ties would be to repeatedly apply the selection mechanism on the candidate nodes. Because I cannot really write a recursive call using GraphScript, I decided to go for a single level:

◈ First find all candidate nodes based on the Warnsdorff rule.
◈ If there is only one candidate, select that one.
◈ Otherwise, compute the minimal neighbour degree for each candidate and select the one with the smallest value. At this point, we could again reach a tie, so we just select the first one (this would be solved via recursion – except for the perfectly symmetrical arrangements).

CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF_AND_POHL" (
  IN iv_origin VARCHAR(2), 
  OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS 
BEGIN
  Graph lo_graph = Graph("BOARD");

  -- get the size of the graph and use it throughout the algorithm
  BigInt lv_size = Count(Vertices(:lo_graph));

  -- retrieve the start point from the graph based on the origin label
  Vertex lo_current = Vertex(:lo_graph, :iv_origin);

  Sequence<Vertex> lt_nodes = [:lo_current];
  ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
  
  WHILE (COUNT(:lt_nodes) <= :lv_size) {
    lo_current.visited = true;
    BigInt lv_min = :lv_size;
    
    -- we will use this sequence to store candidate nodes
    Sequence<Vertex> lt_candidates = Sequence<Vertex>(:lo_graph);
    
    FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
      IF (NOT :lo_first.visited) {
        BigInt lv_first_count = 0L;

        -- compute the degree of each candidate
        FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
          IF (NOT :lo_second.visited) {
            lv_first_count = :lv_first_count + 1L;
          }
        }

        -- add the candidate to the list if needed
        IF (:lv_first_count < :lv_min) {
          lv_min = :lv_first_count;
          lt_candidates = [ :lo_first ];
        } ELSE {
          IF (:lv_first_count == :lv_min) {
            lt_candidates = :lt_candidates || [ :lo_first ];
          }
        }
      }
    }
    
    -- terminate the algorithm if no candidates were found
    IF (COUNT(:lt_candidates) == 0L) {
      BREAK;
    } ELSE {

      -- select the single candidate if only one was found
      IF (COUNT(:lt_candidates) == 1L) {
        lo_current = :lt_candidates[1L];

      -- otherwise, apply the tie-breaking heuristic
      } ELSE {
        lv_min = :lv_size;
        FOREACH lo_candidate IN :lt_candidates {
          BigInt lv_candidate_min = :lv_size;
          
          -- find the minimum neighbour degree for each candidate
          FOREACH lo_first IN NEIGHBORS(:lo_graph, {:lo_candidate}, 1, 1) {
            BigInt lv_first_count = 0L;
            IF (NOT :lo_first.visited) {
              FOREACH lo_second IN NEIGHBORS(:lo_graph, {:lo_first}, 1, 1) {
                IF (NOT :lo_second.visited AND :lo_second != :lo_candidate) {
                  lv_first_count = :lv_first_count + 1L;
                }
              }
            }
            IF (:lv_first_count < :lv_candidate_min) {
              lv_candidate_min = :lv_first_count;
            }
          }

          -- select the candidate with the smallest minimum neighbour degree
          IF (:lv_candidate_min <= :lv_min) {
            lo_current = :lo_candidate;
            lv_min = :lv_candidate_min;
          }
        }
      }
    }
    
    -- add the next node to the path
    lt_nodes = :lt_nodes || [:lo_current];
  }
  
  ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;

Running it seems to generate the expected result:


SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Learning

I still don’t like this solution, because even for asymmetrical arrangements, because of the lack of recursion, we still have to take a random choice.


Roth’s Tie Breaking



Another idea for the tie breaking is to select from the candidate nodes based on the distance from the center of the board (take the node with the maximal distance to the center of the board):

CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF_AND_ROTH" (
  IN iv_origin VARCHAR(2), 
  OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS 
BEGIN
  Graph lo_graph = Graph("BOARD");

  -- get the size of the graph and use it throughout the algorithm
  BigInt lv_size = Count(Vertices(:lo_graph));

  -- retrieve the start point from the graph based on the origin label
  Vertex lo_current = Vertex(:lo_graph, :iv_origin);

  Sequence<Vertex> lt_nodes = [:lo_current];
  ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
  
  WHILE (COUNT(:lt_nodes) <= :lv_size) {
    lo_current.visited = true;
    BigInt lv_min = :lv_size;

    -- we will use this sequence to store candidate nodes
    Sequence<Vertex> lt_candidates = Sequence<Vertex>(:lo_graph);
    
    FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
      IF (NOT :lo_first.visited) {
        BigInt lv_first_count = 0L;

        -- compute the degree of each candidate
        FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
          IF (NOT :lo_second.visited) {
            lv_first_count = :lv_first_count + 1L;
          }
        }

        -- add the candidate to the list if needed
        IF (:lv_first_count < :lv_min) {
          lv_min = :lv_first_count;
          lt_candidates = [ :lo_first ];
        } ELSE {
          IF (:lv_first_count == :lv_min) {
            lt_candidates = :lt_candidates || [ :lo_first ];
          }
        }
      }
    }
    
    -- terminate the algorithm if no candidates were found
    IF (COUNT(:lt_candidates) == 0L) {
      BREAK;
    } ELSE {

      -- select the single candidate if only one was found
      IF (COUNT(:lt_candidates) == 1L) {
        lo_current = :lt_candidates[1L];

      -- otherwise, apply the tie-breaking heuristic
      } ELSE {
        Double lv_max_distance = -1.0;
        FOREACH lo_candidate IN :lt_candidates {
          -- compute the distance between the vertex and the board center
          Double lv_distance = (DOUBLE(:lo_candidate."ROW_NUMBER") - 4.5) 
             * (DOUBLE(:lo_candidate."ROW_NUMBER") - 4.5) 
             + (DOUBLE(:lo_candidate."COL_NUMBER") - 4.5) 
             * (DOUBLE(:lo_candidate."COL_NUMBER") - 4.5);
          -- select the candidate with maximal distance
          IF (:lv_distance > :lv_max_distance ) {
            lo_current = :lo_candidate;
            lv_max_distance = :lv_distance;
          }
        }
      }
    }
    
    -- add the next node to the path
    lt_nodes = :lt_nodes || [:lo_current];
  }
  
  ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;

Surprisingly, running this for the same starting point generated the same result most of the time as Pohl’s tie-breaking.

Bloopers


I don’t want to give a false impression and make people think that everything went smoothly. I actually encountered a few bugs and limitations which made me waste a few hours of my time:

◈ HANA Express is too large for my computer. I had to clean up 40GB of my laptop to be able to run it. I wanted to use HXE to have access to the latest service pack, but I really started thinking to use an older HANA to which I have access…

◈ I am not very happy with the fact that GraphScript is a completely different language than SQLScript, especially because of the fact that most functions available in SQLScript are not available in GraphScript. For example, there is no ABS in GraphScript.

◈ I did not find a way to pass a Graph object through the procedure interface (e.g. to create a recursive procedure).

◈ The WebIDE is always showing a ton of errors as if all the code is completely broken. To be honest, I would prefer to not have any kind of syntax check and just have syntax highlighting.

◈ Once I introduced the first IF clause, the WebIDE “Run” button would not anymore execute my “CREATE OR REPLACE PROCEDURE” statement. I had to resort to manually sending the code via an jQuery.ajax call from the development console.

jQuery.ajax({
   method: "POST", 
   url: "/sap/hana/cst/api/v2/sessions(...)/connections(...)/sqlExecute()", 
   headers: {"X-CSRF-Token": "..."}, 
   contentType: "application/json", 
   data: JSON.stringify({
      "statements": [{"statement": code, "type": "UPDATE"}],
      "maxResultSize": 1000, 
      "limitLOBColumnSize": 1024, 
      "includePosColumn": true,
      "bExeWithoutPrepare": false,
      "bStopOnError": true,
      "bRunAsBackgroundActivity": false
   })
});