Deep Learning for TS
There is often a tradeoff between flexibility and interpretability when modeling data using deep learning techniques like neural networks. As models become more flexible, the interpretability of the model often drops. This leads to models becoming very confusing when trying to understand how it is making decisions (hence the interpretability piece). When modeling time-series data, specifically temperature data, interpretability is less critical. There is less concern for how the model works versus whether it gives accurate predictions. Thus, time series modeling using deep learning techniques such as different types of neural networks is a viable option.
In this analysis, the temperature data is used as input into three different types of neural networks. Each network was trained, and accuracies are calculated. From this, forecasts were made for temperatures in the near future. A recurrent, GRU, and Long Short Term Memory neural network were used in this analysis. These models are used more often for sequential data like text, and since time series are sequential, these are good choices to begin with.
Time Series Plot
Firstly, it is essential to remember what this time series data looks like. Here below is a plot of the temperature data, specifically averaged for the state of California.
Deep Learning Models
Next, the time series data is split into a training and testing set. Since this is time-series data, the training and testing sets must be split in such a way that time is accounted for. Here, a 90,10 split for training and testing sets is used. Each of the three models is constructed and trained using the training data. Below are the results for these models.
Recurrent Neural Networks
A simple recurrent neural network is constructed and trained using TensorFlow and Keras. This is a deep neural network with three hidden layers, one dense layer, and the activation function is a hyperbolic tangent. Next, a second recurrent neural network is constructed using the exact same parameters, except the second model utilizes kernel regularization, which attempts to compensate for the potential of overfitting. If the regularization worked, the training error for the non-regularized network might be lower, but for the regularized network, the testing error should be lower. Below are the validation loss plots for each neural network, one regularized and one not.
From the plots above, the regularization seems to have a significant in removing training loss at a much more rapid pace. It will be interesting to see whether this affects either the training or testing error. Those values can be seen in the table further below.
Gated Recurrent Unit (GRU) Neural Networks
Using TensorFlow and Keras, a GRU neural network is constructed and trained. A gated recurrent neural network is like a long short term memory network (talked more about in the next section), except it has fewer parameters than an LSTM and it lacks an output gate. According to Wikipedia, "GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets". Since this dataset has temperature values dating back to 1895, the dataset is not that small, so it isn't clear this model will be effective. However, after some research, this type of model is supposed to give comparable performance to an LSTM model, however, it is far more efficient. Again, two models were trainined, one not regularized, and one regularized. This was done to explore the effects of regularization in the model. Below are the results of the training sequence and the training loss over time.
From the plots above, the GRU model without regularization phenominal, with a training error being extremely low as compared to other models. Immediately, this evokes memories of overfitting. However, using regularization, the loss is worse but still not good as compared to the RNN model above. The training and testing RMSE values are again seen in a table further below.
Long Short Term Memory (LSTM) Neural Networks
Using TensorFlow and Keras, an LSTM neural network is constructed and trained. An LSTM neural network is an artificial recurrent neural network that has feedback connections, unlike other neural networks. They are well suited for predicting sequential data, specifically time series data, so using them here to predict temperature data over time makes good sense. Again, two models were trained, one not regularized and one regularized. This was done to explore the effects of regularization in the model. Below are the results of the training sequence and the training loss over time.
From the plots above, the LSTM model performs similarly in training to the simple recurrent neural network, but overall, the model seems to be very solid. Interestingly, the LSTM model with regularization is not significantly different from the model without regularization, but perhaps in testing, the errors will show different values.
Forecasting
Next, predictions can be made against the testing data and comparisons can be plotted and examined using each training network. Using these predictions, the models can forecast at least three time periods ahead, which is pretty good considering weather predictions are generally only accurate updates to about five time periods ahead. The following plots were created once the models were trained and predictions were made. There are two separate plots, one for all of the networks that were not regularized and another for the networks that were regularized.
The first immediate observation is the abysmal performance of the GRU model. This was seen in the training loss plot above and again is seen here. After multiple training sessions of the GRU model, it is not always this poor, but approximately 50% of the time, the model fails to train at all. Half of the time, the non-regularized model looks like the regularized version; half of the time, it looks like this. Comparatively, the simple RNN and the LSTM model do a pretty decent job of predicting the temperatures over time. From a brief look, the LSTM model looks to be slightly outperforming the simple RNN, which is expected, but a table of the RMSE values should be able to confirm this.
As seen in the raw RMSE results above, the LSTM model is the best model for this data. It produced a decent RMSE value, and the plots indicated the forecasting is pretty close. When compared to the SARIMA model for this data, which had an MSE of 3.274, or an RMSE of approximately 1.8, the neural network is far better at forecasting than the ARIMA model. However, the ARIMA model is more straightforward than a neural network with far fewer parameters and easier to understand. However, the LSTM neural network is a much better way to go when needing to produce the absolute best forecasts. An ARIMA model is far easier to set up and should be considered, especially with its ability to be interpretable, but a neural network is an excellent alternative.