- Matt Smith

# Infilling Missing Elevation Data for Network Modelling

Hydraulic network modelling is a powerful tool with the potential to provide great insight into the behaviour of water networks and how their operation can be improved. When constructing models, we concern ourselves with ‘links’ through which we can solve for flow rates and the ‘nodes’ between them at which we solve for pressures. To do this we must specify the elevation of each node.

Some of these elevations are used to determine the driving heads in the system, and so their accuracy is critically important to the model’s validity and should ideally come from accurate measurements i.e. GPS survey data.

However, there are many nodes in the model which merely represent the junction between two lengths of pipe, or at which it’s otherwise not usually necessary to specify the elevation with a high degree of accuracy. The elevations of these nodes are often missing from GIS data, but to run the model, we must infill them with suitable values.

We were recently faced with this problem while designing a system to construct DMA network models from a client’s GIS data, where roughly 2.5% of nodes across the areas of interest did not have elevations. This post will outline how the problem was solved and should hopefully prove informative to others looking to address similar gaps in their data.

Figure 1: Bilinear Interpolation from a rectilinear grid to estimate the value at point P from known values at points Q11, Q12, Q21 & Q22 (Source: Wikipedia)

A standard approach to solving the problem is to use bilinear interpolation (Figure 1) to interpolate the unknown elevations from a ground-level grid, such as the Ordnance Survey ‘Terrain 5’ dataset, which covers Great Britain at five-metre intervals. However, access to such datasets is subject to licensing costs. We might be able to get access through the client, but this normally takes time. Keen to maximise the benefit of our existing datasets, we got to thinking. Shouldn’t it be possible to interpolate these missing points from the 97.5% of known elevations surrounding them?

Unlike in the OS dataset our known elevations are not given at regular intervals and so call for a different interpolation method. By considering that the coordinates of the known elevations form a triangular unstructured grid, we can apply barycentric linear interpolation to approximate the elevation at any point in between. This is weighted based on a coordinate system which expresses the position of a point in terms of three ratios which represent its distance from the triangles ‘centre of mass’ with respect to each vertex (i.e. point of known elevation). This is shown in Figure 2a, where the point (1/3,1/3,1/3) is at the centre of mass of each triangle. By weighting our interpolation with these ratios, we can essentially approximate the elevation of our unknown points as if they lay on a mesh surface, as shown in Figure 2b. This should provide a much more accurate approximation than simply using nearest neighbour infilling.

Figure 2: Application of barycentric coordinates to facilitate linear interpolation on a triangular grid (Source: Wikipedia)

Python’s SciPy library provides an implementation of this interpolation method with the *griddata* function in *scipy.interpolate* (Figure 3a shown below). Figure 3b shows an example result, where the elevation of the green node has been infilled.

Figure 3: Use of *scipy.interpolate.griddata* functionality to infill elevations, with example result

The drawback of this approach is that its accuracy relies on having good coverage of known elevations from which to interpolate. This means that while it is effective for filling gaps in a mostly complete dataset, for large-scale infilling of unsurveyed points it would be more appropriate to use a ground-level grid.

The major strength of the approach is that it can infill elevations to a degree of accuracy appropriate for network modelling purposes without relying on access to any external dataset. Combined with the convenience of the Python function, this has allowed us to incorporate the technique into our automated GIS data import and cleansing processes.

It was mentioned earlier that for some types of network analysis, the main driver to infill missing elevations is simply to get the model to run, where accurate simulation of the pressure at these nodes is not important. Which prompts the question: Why bother to pursue more sophisticated interpolation techniques, rather than just copying the elevation from the nearest neighbouring node?

One good reason is to enable accurate simulation of pressure-driven demand, where leakage is proportional to pressure and so responds realistically to changes in the network, which can prove especially useful for evaluating potential leakage savings of pressure management schemes.