Financial Time Series Models (ARCH/GARCH)
In a slight detour in this project, this section concerns fitting time series data with statistical models that concern the variance in the data. These models are called autoregressive conditional heteroskedasticity (ARCH) models and are used when there is an autoregressive term in the error variance. They are typically used when modeling time-series data that exhibit volatility, which happens most often in financial time series. So, for this section, a set of stock data is retrieved and analyzed using a combination of ARCH and ARIMA.
For this section, the stock of choice is Chevron, which is quite volatile and connected to climate change as it is an oil/gas-related company. After gathering the adjusted closing prices from Yahoo Finance, the following visualization below shows the price trend over 20+ years.
From the above figure, we can see the spikes of volatility in the price, especially around certain points where the price dramatically drops and rises (see around mid 2020). This is a good first indicator that there will be an ARCH/GARCH component when fitting the data. Another way to look at this data is by taking a look at price versus volume. This just gives us a better sense of the price and a different perspective, but it doesn't necessarily provide us with more insight.
Again, the above figure shows us more information on the price trends versus the volume, but the overall conclusion is the same. However, the next figure shows the volatility better as the log of the returns is taken and plotted.
The above figure shows the log of the returns, which gives a much better sense of the volatility of the returns for Chevron. Here, since there is clear volatility, an ARCH/GARCH model will be needed to fit the data. However, before fitting an ARCH model immediately, it is important to check the ACF/PACF plots to check if an ARIMA model should be fit first before fitting an ARCH model on the residuals. First, since we took the log of the returns, we should plot the ACF to check if we should difference the data.
From the above figure, it is clear that a difference must be done to account for the correlation in the data. After differencing the data once, the ACF and PACF plots are created and shown below.
Now that the data is differenced, the ACF and PACF plots clearly show a need for an ARIMA model to fit the data. Once an ARIMA model is fit, there will probably be a need to fit an ARCH model on the residuals, but that will depend on the residuals. Let's fit an ARIMA model first. From the above ACF and PACF plots, it would be appropriate to check all Arima models with p values ranging from 1-4 and q values ranging from 1-3. For the ARIMA model, d is set to 1 as we differenced the data once. From that, the following results were achieved.
By sorting the table, the smallest AIC has the parameters of 4,1,1, while by sorting using the BIC criterion, the model has parameters of 1,1,1. Next, a function allows for an auto-selection of an ARIMA model. According to the auto.arima method in R, the parameters for the model should be 0,1,1. However, by looking at the results of the auto.arima call, the AIC and BIC criterion are larger than both of the models found manually. Therefore, the simplest model should be chosen with these ideas in mind. Here, the 1,1,1 model makes the most sense.
Next, the model should undergo some diagnostics to make sure the fit looks good. Below are some basic diagnostics run using the 1,1,1 model.
Given the figure above, the following observations can be made. First, when looking at the plot of standardized residuals, the mean should be around 0, with the variance at about 1. As seen above, this is generally true, with the mean close to 0. However, the variance is probably slightly more than 1 and there is clusters of higher variance, which indicates a second model should be fit for the errors. Generally, if the mean or variance is significantly different than expected, this would be a sign of a poor model fit, but we don't really see that here. Next, when inspecting the ACF of residuals, the plot shows no significant lags, which is a very encouraging sign considering the logic above. Next, when looking at the qq-plot, the hope is to see some signs of normality. Here, we see some signs of normality in the data, but with a bit of skew. Finally, when looking at the p-values for the Ljung-Box test, many of the p-values are above 0.05, which is a decent sign of a good model.
Next, we can do an initial check on the absolute returns and squared returns to check if an ARCH model is needed. Below is a set of visualizations for the ACF and PACF plots for the absolute and squared returns.
From the above plots and the previous diagnostics, it is pretty clear an ARCH model is needed to fit the residuals of the ARIMA model. Using the above plots, the p-value should range between 1-7. Then, using a similar technique to fitting the ARIMA model, a set of ARCH models can be created and evaluated to find the best fit for the residuals. Below is another table that shows the fitted ARCH models with their associated AICs.
By sorting the table, the smallest AIC has the parameters of 7,0. Therefore, we should fit the residuals with a 7,0 ARCH model. Now that we have fit the data with an ARCH/ARIMA model, the next step is to check the fit. One option is to check the residuals. First, we can plot the residuals and evaluate the results. Additionally, we can check whether the residuals are normal.
From above, we can see a couple of important diagnostic plots. The first set of plots show a set of 3 plots of the residuals, the ACF of the residuals, and a distribution of the residuals. The residuals seem to now have a constant mean and variance according to the first plot. Next, the ACF shows strong signs of stationarity in the residuals. Finally, the distribution of the residuals is relatively normal. This can be confirmed using the second tab, which is a qqnorm plot of the residuals. Using this plot, the normality of the residuals can be confirmed. These plots all point towards strong signs the model is a good fit for the data. Finally, a Ljung-Box test outputs a p-value greater than 0.05, indicating that the residuals are independent, encouraging the suitable model. Here, after doing the model diagnostics, the model above can be used for forecasting. The model for the ARCH(7,0) + ARIMA(1,1,1) can be written as the equation below.
Φ(1-B)xt-1 = Θ(1-B)yt-1 + δ
Where Φ(1-B) = 1 - Φ1(1-B), and Θ(B-1) = 1 - Θ1(1-B)
yt = σtεt
σ2 = α0 + α1y2t-1 + α2y2t-2 + α3y2t-3 + α4y2t-4 + α5y2t-5 + α6y2t-6 + α7y2t-7
Using this model, the data can be effectively modeled and perhaps future values can be forecasted. However, stock market prices are generally impossible to forecast. We can say this model does a decent job at modeling the data and its variance.