Predicting Levels of Earthquake Damage: A Comparison of Classification Models

Predicting Levels of Earthquake Damage: A Comparison of Classification Models

Analysis and Statistical Methods


Workflow

In order to build the best predictive model, several classification models were trained and tuned in the following order, with details discussed regarding each step in the sections below.

  1. Train nine classification models on the subset of training data.
  2. Tune the top six models on the subset of training data.
  3. Re-train and tune the top two models on the entire training data set.
  4. Utilize feature engineering to attempt improvement over the top two models.

Train Nine Classification Models

Nine classification models were trained on the subset training data set using 10-fold cross validation. The nine algorithms were: Logistic Regression, Naive Bayes, Decision Tree, Random Forest, AdaBoost, Bagging, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and a Neural Network (MLP). The nine models were then run on the subset testing data set, and their resulting accuracies are shown in Figure 9.


See the codeDownload the Data
Accuracy of Nine Classification Models
Figure 9: Accuracy of Nine Classification Models

Figure 9 provides a visualization of how the models performed in their prediction of damage levels. It's seen that Naive Bayes and QDA performed much worse than the other seven classifiers and that Random Forest and Bagging appear to have performed the best. Considering these results, the study continued as follows.

Tune Top 6 Models

As seen in Figure 9, its clear that Naive Bayes and QDA perform much worse than the other seven classifiers. Therefore, we decided not to move forward with tuning these models. Also, since Random Forest can be considered as an extension and execution of many decision trees, we also decided not to continue with tuning for the Decision Tree algorithm. So, the six best classifiers (excluding Decision Tree) were tuned and re-assessed by using them for the prediction of the subset testing data's labels. The models' accuracies are shown in Figure 10.


Accuracy of Top Six Classification Models
Figure 10: Accuracy of Top Six Classification Models

Figure 10 shows that the Random Forest and adaboost tuned models clearly performed the best as compared with the other four tuned models. However, the range in accuracy only spans approximately eleven percentage points.

To further investigate the results of these models, a receiver operating characteristic (ROC) curve was produced to show the true positive rates versus false positive rates of the six models in predicting "Low" vs "Medium" OR "High" damage levels. It was chosen to examine the prediction in this way due to the distribution of labels within the training data. Since the data was somewhat imbalanced as is seen in Figure 2, we wanted to examine how well the models did in predicting one of the lesser-represented data labels ("Low") as compared with the larger-represented data label of "Medium". Then, in order to show the results best in a single plot, it was decided to include the prediction of "High" with "Medium", such that the plot would represent the true and false positive rates of predicting "Low" in general as compared to either of the other labels, "Medium" or "High". The resulting ROC plot is shown in Figure 11.


ROC Curve
Figure 11: Receiver Operating Characteristic (ROC) Curve of Six Tuned Classification Models, Damage Level "Low" vs "Med"/"High"

Figure 11's ROC plot solidifies the finding that the Random Forest and AdaBoost models performed the best. In this plot, the best models are those that "hug" the top left corner such that higher true positive rates are seen for lower false positive rates. This is the desired result because it indicates that the model is able to accurately predict the labels at greater rates while maintaining lower rates of falsely labeling the incorrect data points. Thus, it's seen that the Random Forest (orange line) and AdaBoost (green line) performed the best of the six models.

Next, since the Random Forest and AdaBoost models performed the best, the confusion matrices of their prediction results can be examined for better understanding of their performance. These are shown below. Additionally, the confusion matrices for the other models are showing in the rest of the tabs.


Confusion Matrix for Random Forest
Figure 12: The confusion matrix for random forest

The confusion matrices of the Random Forest and AdaBoost classification models shown in Figure 5 indicate that the two models truly did perform very similarly. The percent of data that was predicted in each cell is very similar between the two models. It's seen that almost half of the testing data points were accurately predicted to be of "Medium" damage level. However, of the data that had an actual label of "High" both models predicted almost half to be "Medium" and the other half to be "High"
This may suggest that these levels of damage have similar criteria seen in the variables. Or, this could be a result of the non-uniform distribution of labels across the training and testing data, and therefore the models did not learn how to distinguish between the majority of data points that were "Medium" damage, and the fewer data points that were "High" damage. Overall though, these confusion matrices echo the results above in regards to the similarity of performance of the two models. But, clearly there is still room for improvement of predictions.

Train and Tune the Top Two Models on the Entire Training Data Set

Finally, the top two models (Random Forest and AdaBoost) were re-trained and tuned using the entire training dataset (as opposed to the subset). The resulting two models were used to predict the entire testing dataset, and the results of this are discussed in the Results section.