The Effect of Climate Change

The Effect of Climate Change

Modeling Flood Time Series Data


Next, when first attempting to model time series data, the first model considered is an autoregressive or moving average model. These AR/MA models are used to understand better the data and forecast future values based on historical data recorded previously. Before using these models, the time series must be stationary, meaning the mean and variance do not change over time. The most often cause of a violation of stationarity is a trend, where values slowly increase over time. So first, to check whether the data is stationary, we can use an ACF (autocorrelation function) to test for stationarity. Below is the ACF plot for the flood data explored and visualized in previous sections.


See the codeDownload the Data
Flood ACF Plot

Per the figure above, the series is already stationary. An Augmented Dickey-Fuller test also confirms this to be the case. Since the series is stationary, the next steps towards modeling the data using ARMA can be taken. The combination of the ACF and PACF will provide insight into the bounds for the parameters in the ARMA model. The parameters, which are the significant values from the ACF/PACF plots, will allow us to model the data to emulate the real-world data. Since there was no need to difference the data, the model built is an ARMA model, or autoregressive moving average model. The parameter p is the number of lags of the autoregressive part of the model and can be found by counting the number of significant lags above the blue line in the PACF plot. The parameter q is the order of the moving average model, which again can be counted as the number of significant values above the blue line in the ACF plot. Finally, since this is an ARMA model, the parameter d is the degree of differencing, which for this model is 0. From the ACF and PACF plots below, the other two parameters can be determined.



From the above visualization, the ACF plot reveals at least one significant lag after the differencing occurs. For the moving average part of the model, the parameter q should range between 1 and 2. Finally, by using the PACF plot, there seem to be 1 significant lag before the plot settles down. In this case, the parameter p should range between 1 and 2. Additionally, even though the data was not differenced, for completeness sake, the d parameter should range from 0-1. This means an ARIMA model will be used unless the results show that a difference of 0 is better than a difference of 0. Once these parameters have been chosen, the next step is to run a number of different ARMA models and choose the model with the lowest AIC/BIC criterion. The following table shows the results of this experiment.

By sorting the table, the smallest AIC has the parameters of 0,1,2 while by sorting using the BIC criterion, the model also has parameters of 0,1,2. Next, there is a function that allows for an auto-selection of an ARIMA model. According to the auto.arima method in R, the parameters for the model should be 0,1,2 as well. Here, since all signs point towards using the 0,1,2 model, that is what should be used.

Next, the model should undergo some diagnostics to make sure the fit looks good. Below are some basic diagnostics run using the 0,1,2 model.


Flood Diagnostics Plot

Given the figure above, the following observations can be made. First, when looking at the plot of standardized residuals, the mean should be around 0 with the variance at about 1. As seen above, this is generally true, with the mean close to 0. However, the variance is probably slightly more than 1. Generally, if the mean or variance is significantly different than expected, this would be a sign of a poor model fit, but we don't really see that here. Next, when inspecting the ACF of residuals, the plot shows no significant lags, which is a very encouraging sign considering the logic above. Next, when looking at the qq-plot, the hope is to see some signs of normality. Here, we see some signs of normality in the data, but with a bit of skew and overall very weak signs. Finally, when looking at the p-values for the Ljung-Box test, the p-values are greater than 0.05, indicating the residuals are independent, which is an encouraging sign for the suitable model. Here, after doing the model diagnostics, the model above can be used for forecasting. The model for 0,1,2 can be written as the equation below.



xt = θ1(1-B)wt-1 + θ2(1-B)wt-2 + εt




Next, it would be prudent to run some methods to forecast future values. Below are two methods that show the ARIMA model attempting to forecast future values.



As can be seen with the forecast plots, it is still quite difficult to gauge the variance of the data, so both plots give quite a lot of room for error. From the future forecast plot, the short-term forecast seems more definitive while the long-term forecast seems to use the mean and a large variance as its predictions.

Finally, before completely committing to the ARIMA model derived above, this model should be compared to other models such as the random walk forecast model, naive model, and mean forecast model. By utilizing the time series data, each model can be tested and the prediction errors can be calculated. Below are the results.

The ARIMA model has the lowest mean absolute error and mean squared error from the above table. This is an excellent indication the model that was chosen was a good choice.