Sam Pastoriza

The Effect of Climate Change

Introduction

Data Sources

Data Visualization

Exploratory Data Analysis

ARMA/ARIMA/SARIMA Models

Financial Time Series Models (ARCH/GARCH)

Deep Learning for TS

Conclusions

References

Home > Weather > ARMA/ARIMA/SARIMA Models > Modeling Flood Time Series Data

Modeling Flood Time Series Data

Next, when first attempting to model time series data, the first model considered is an autoregressive or moving average model. These AR/MA models are used to understand better the data and forecast future values based on historical data recorded previously. Before using these models, the time series must be stationary, meaning the mean and variance do not change over time. The most often cause of a violation of stationarity is a trend, where values slowly increase over time. So first, to check whether the data is stationary, we can use an ACF (autocorrelation function) to test for stationarity. Below is the ACF plot for the flood data explored and visualized in previous sections.

See the code Download the Data

Per the figure above, the series is already stationary. An Augmented Dickey-Fuller test also confirms this to be the case. Since the series is stationary, the next steps towards modeling the data using ARMA can be taken. The combination of the ACF and PACF will provide insight into the bounds for the parameters in the ARMA model. The parameters, which are the significant values from the ACF/PACF plots, will allow us to model the data to emulate the real-world data. Since there was no need to difference the data, the model built is an ARMA model, or autoregressive moving average model. The parameter p is the number of lags of the autoregressive part of the model and can be found by counting the number of significant lags above the blue line in the PACF plot. The parameter q is the order of the moving average model, which again can be counted as the number of significant values above the blue line in the ACF plot. Finally, since this is an ARMA model, the parameter d is the degree of differencing, which for this model is 0. From the ACF and PACF plots below, the other two parameters can be determined.

See the code Download the Data

From the above visualization, the ACF plot reveals at least one significant lag after the differencing occurs. For the moving average part of the model, the parameter q should range between 1 and 2. Finally, by using the PACF plot, there seem to be 1 significant lag before the plot settles down. In this case, the parameter p should range between 1 and 2. Additionally, even though the data was not differenced, for completeness sake, the d parameter should range from 0-1. This means an ARIMA model will be used unless the results show that a difference of 0 is better than a difference of 0. Once these parameters have been chosen, the next step is to run a number of different ARMA models and choose the model with the lowest AIC/BIC criterion. The following table shows the results of this experiment.

ARMA Results

P	D	Q	AIC	BIC	AICc
0	0	1	1801.84634111853	1811.4252116712	1801.98270475489
0	1	1	1813.83125776421	1820.20602937589	1813.89943958239
0	0	2	1803.40353476786	1816.17536217142	1803.63210619643
0	1	2	1793.29999201661	1802.86214943413	1793.43713487375
1	0	1	1803.35373893363	1816.12556633719	1803.5823103622
1	1	1	1795.67581558069	1805.23797299821	1795.81295843783
1	0	2	1804.86367612854	1820.82846038299	1805.20850371475
1	1	2	1795.25070411073	1808.0002473341	1795.4805891682

By sorting the table, the smallest AIC has the parameters of 0,1,2 while by sorting using the BIC criterion, the model also has parameters of 0,1,2. Next, there is a function that allows for an auto-selection of an ARIMA model. According to the auto.arima method in R, the parameters for the model should be 0,1,2 as well. Here, since all signs point towards using the 0,1,2 model, that is what should be used.

Next, the model should undergo some diagnostics to make sure the fit looks good. Below are some basic diagnostics run using the 0,1,2 model.

See the code Download the Data

Given the figure above, the following observations can be made. First, when looking at the plot of standardized residuals, the mean should be around 0 with the variance at about 1. As seen above, this is generally true, with the mean close to 0. However, the variance is probably slightly more than 1. Generally, if the mean or variance is significantly different than expected, this would be a sign of a poor model fit, but we don't really see that here. Next, when inspecting the ACF of residuals, the plot shows no significant lags, which is a very encouraging sign considering the logic above. Next, when looking at the qq-plot, the hope is to see some signs of normality. Here, we see some signs of normality in the data, but with a bit of skew and overall very weak signs. Finally, when looking at the p-values for the Ljung-Box test, the p-values are greater than 0.05, indicating the residuals are independent, which is an encouraging sign for the suitable model. Here, after doing the model diagnostics, the model above can be used for forecasting. The model for 0,1,2 can be written as the equation below.

x_t = θ₁(1-B)w_t-1 + θ₂(1-B)w_t-2 + ε_t

Next, it would be prudent to run some methods to forecast future values. Below are two methods that show the ARIMA model attempting to forecast future values.

See the code Download the Data

As can be seen with the forecast plots, it is still quite difficult to gauge the variance of the data, so both plots give quite a lot of room for error. From the future forecast plot, the short-term forecast seems more definitive while the long-term forecast seems to use the mean and a large variance as its predictions.

Finally, before completely committing to the ARIMA model derived above, this model should be compared to other models such as the random walk forecast model, naive model, and mean forecast model. By utilizing the time series data, each model can be tested and the prediction errors can be calculated. Below are the results.

Model Comparison

Model	Mean Absolute Error	Mean Squared Error
Arima	4.26112565842232	3364.71907708549
Mean Forecast	15.36	3511.11182222222
Naive	31.1333333333333	4244.46666666667
Random Walk Forecast	31.1333333333333	4244.46666666667

The ARIMA model has the lowest mean absolute error and mean squared error from the above table. This is an excellent indication the model that was chosen was a good choice.