• Raasheed Pakwashi

Exploring Auto Regressive models to help predict DMA daily flow

In time series analysis an Auto Regressive (AR) model is a simple linear regression model used to predict current values from previous values. Along with the simple Moving Average (MA) model it forms the fundamental component of many other more complex models e.g. ARMA and ARIMA models. When choosing a model to help predict a time series, it is desirable to choose one which is parsimonious. This means a simple model is generally preferred to a more complex one, and a complex model should only be used if it provides a measurable improvement in predictive power. The following report explores how well an AR model performs when used to model DMA daily flow.

A time series of total daily flow for a period of 2 years was used to derive an AR model for a DMA. The model was then tested on the following 6 months of recorded data. Figure 1 shows the data used to ‘train’ the model.

Figure 1

As the AR model uses the past to predict current values it is important to understand how far back one should look to able to successfully predict new values. The Auto Correlation Function (ACF) and Partial Auto Correlation Function (PACF) both help do this by providing some insight into the relationship between past and current values.

Figures 2 & 3

The ACF plot (Figure 2) suggests that there is a statistically significant correlation (outside the blue envelope) between current values and up to almost the last 30 lags. Whereas, the PACF plot (Figure 3) suggests there is a statistically significant correlation (outside the blue envelope) when looking at the previous 7 lags. The PACF demonstrates the direct impact of the lagged values on the present and is extremely important in understanding how complex an AR model needs to be.

In order to determine what AR model fits the data best, we start with the simplest AR(1) model (using only one previous value). If the calculated coefficients are statistically significant we can then try to fit an AR(2) (using the previous two values). If the AR(2) also returns statistically significant coefficients then the question arises, is the predictive power of the more complex AR(2) model statistically better than the predictive power of the simpler AR(1) model? We can conduct a ‘log likelihood ratio’ test to help determine this.

We can then continue in this fashion, fitting AR models with increasing lags and then comparing them with simpler models to ensure the ‘cost’ of computing the more complex model does not outweigh its predictive power.

Applying this method to the DMA daily flow returns an AR(13) model as the simplest model with highest predictive power. To ensure we are happy with the returned model, we analyse the residuals for each timestep to make sure they resemble ‘white noise’ (consistently random) and therefore confirming there are no more patterns of behaviour to extract and incorporate into our model.

A plot of the residuals, as shown in Figure 4 for the AR(13), broadly displays this as white noise. Any day where there is a higher than average spike can be attributed towards ‘anomalous’ network behaviour e.g. a burst or abnormally high demand.

Figure 4

So, how well does the derived AR(13) model work in practice? Figure 5 shows predicted values (red) derived from the AR(13) model plotted against actual recorded data (blue).

Figure 5

The AR(13) model clearly holds some predictive capability, but there is also clear room for improvement. AR models perform better with stationary data and the DMA daily flow above is not from a stationary process. A stationary data set exhibits a constant mean, variance and covariance for any fixed period from the timeseries. Furthermore, the AR model does not take into account any past errors and is therefore strongly affected by sudden changes in flow. It also doesn’t take into account any seasonal or cyclical behaviour which can be clearly seen in the ‘Actual’ values shown in Figure 5.

In conclusion, simple AR models demonstrate limited capability in predicting DMA daily flows.

Our next steps

  • Look at simple Moving Average (MA) models for comparison with a simple AR model.

  • Look at more complex ARMA, ARIMA, ARIMAX and SARIMAX models for improved predictions.

  • Look at other DMAs to see how they differ and whether they require different models.

  • Look at DMA flow with different frequencies (e.g. hourly or monthly) to investigate what other benefits and insights can be derived from them.