Demand insights using open source data

Recently we posted a short update on how Paradigm has been able to identify unmetered allotments in our clients’ DMAs. We were able to home in on these users thanks to our confidence in the ‘primary’ components that we have set up in Paradigm, which enable us to create accurate expected profiles throughout the year. By understanding typical consumption within a DMA using specific attributes, Paradigm can account for the majority of water passing through the network. This gives us the opportunity to focus on what is left, the ‘unaccounted for water’ (UFW); we can do this either at a daily level, or by looking at how the UFW changes over an extended period.

The allotments were identified using the second approach. By analysing the UFW over a full year it was observed that net flow would vary seasonally in some DMAs, with notable increases of 1-2 l/s between February and July (fig i.). Most of the DMAs investigated were in urban areas, which discounted any possibility that this shift was caused by agricultural or market garden demand. On further investigation we were able to find allotment plots of varying size within these DMAs and set actions to confirm that they were the cause of the UFW. The allotment in figure i. was previously unknown to our client but has now been metered and consumption is being measured.

Demand insights using open source data

Figure i. Observed vs. expected DMA net flow

This approach shows great results when analysing individual DMAs, but the question is: can we develop a component within Paradigm that automatically adjusts the expected net flow profile and accounts for typical allotment consumption without having to conduct in-depth investigations? This blog will focus on whether open-source data can be used to understand the relationship between allotment size and typical daily consumption values.

Allotment size/consumption relationship

To investigate the relationship between allotment area and typical daily consumption, a data source for each attribute will be required. Consumption values and address data can be sourced from our clients, but unfortunately spatial data such as area is not maintained in the same way. To solve this issue, we have used Overpass Turbo which is a web-based mining tool for OpenStreetMap. This tool allows the user to plug in key words within a chosen boundary and will return all points of interest or polygons containing the key word. These results can then be exported as a GeoJSON and projected as a layer in QGIS (fig.ii). From here the area of each polygon can be calculated, and the resulting attributes exported into Excel. Going forward this could be automated using a geo-spatial database.

Demand insights using open source data

Figure ii. Projected allotment polygon from Overpass Turbo

Unfortunately, due to allotments typically being tucked away and difficult to track down, they are often a neglected customer type in water company records, which made it difficult to accurately match the two data sets. However, when plotting the relationship between area and daily consumption a trend can be observed despite the small sample size (fig.iii).

Demand insights using open source data 2

Figure iii. Relationship between allotment size and billed average daily consumption

The scatter plot in figure iii. shows a positive linear relationship, demonstrating how typical daily consumption should be expected to increase with allotment size. There are a significant number of allotments with a total area below 10,000 m², and of these a handful have lower than expected consumption values; it may be that these properties are no longer in use or have meter issues.

Those highlighted in orange are clear outliers, but possible issues at these sites may offer insights that could benefit the development of Paradigm as well as delivering insights for leakage teams. By identifying allotments that do not follow the expected trend it is possible to hypothesise the causes. For the allotment displaying higher than expected consumption for its size, it would be beneficial to check the site for possible customer side leakage, or for any users that are creating increased demand. On the opposite end of the scale, the allotment that has an average daily consumption of ~0.5 m³/d despite being the largest in the sample could have a second feed that is currently unknown or unmetered, or a meter fault that is causing it to under record. Investigating these allotments could yield benefits by helping to account for UFW within the DMAs and reduce reported leakage.

Without the use of the Overpass Turbo mined data it would be very difficult to understand which allotments were using more or less water than expected. If the average daily consumption was analysed alone, it would highlight those at the upper end of the scale as being outliers, when in reality they fit well within the expected trend.

By applying this relationship to the allotment found within the DMA in figure i., the expected average daily consumption value would be as denoted by the green highlighted plot in figure iii. Using the OpenStreetMap data this allotment has a calculated area of ~90,000 m², which when compared with other matched allotments would give an expected consumption of approximately 2.25 m³/d. It will be interesting to see whether the consumption now being measured at the site is comparable with the expected values once enough flow data has been recorded.

Can open-source data benefit Paradigm?

Whilst this investigation has been on a relatively small scale, it seems that using open-source data such as the Overpass Turbo tool could prove invaluable to building hypotheses and components for Paradigm. The tool itself is user friendly and returns data quickly, often with extra attributes such as property owners and facilities on site. The ability to set designated boundaries can also allow for analysis to be done in specific locations, which means that the scope could be extended to other water companies, increasing the sample size greatly.

Useful initial insights into the allotment size/consumption relationship have been observed, but the spatial data itself could be useful to water companies when trying to identify previously unknown allotments. There is clearly a gap in customer data quality, so any tools that could assist in finding and accounting for allotment consumption could provide benefits before using Paradigm to investigate remaining UFW.

Furthermore, losing 1 m³/day to unaccounted for water will equate to a revenue loss of approximately £1000/year for a typical water and sewerage company. Taking the examples shown in Figure iii., expected losses for the allotment that is under recording, and the site that is not currently being billed would be approximately £2000/year per property. When this potential benefit is quantified, it demonstrates how resolving these issues could present a large benefit in the short term for water companies, as well as presenting a means of prioritising allotment investigation. Increasing the number of billed allotment sites would further improve the sample for deriving an expected allotment consumption component in Paradigm.

To conclude, I feel that utilising this type of data will be beneficial to the development of Paradigm and could have a wide range of applications in the future as more hypotheses and requirements are added to the model.

Related articles