A Practical Approach to using Supervised Machine Learning Models to Classify Aviation Safety Occurrences
Bryan Y. Siow
TL;DR
This study investigates a practical supervised ML pipeline to classify aviation safety occurrences into Incident or Serious Incident, addressing inconsistencies in ICAO Annex 13 definitions. It compares five models (SVM, Logistic Regression, RFC, XGBoost, KNN) on a dataset of 475 labeled reports, with Random Forest achieving the best performance (accuracy $0.77$, F1 $0.78$, MCC $0.51$) and SMOTE offering little to no benefit. The approach is implemented as an ML web application and benchmarked against human predictions in ECAC workshop case studies, showing general alignment with expert judgments while illustrating interpretive differences. The work demonstrates a feasible, deployable tool to support aviation safety investigators and provides a foundation for further AI-enabled enhancements in safety classification.
Abstract
This paper describes a practical approach of using supervised machine learning (ML) models to assist safety investigators to classify aviation occurrences into either incident or serious incident categories. Our implementation currently deployed as a ML web application is trained on a labelled dataset derived from publicly available aviation investigation reports. A selection of five supervised learning models (Support Vector Machine, Logistic Regression, Random Forest Classifier, XGBoost and K-Nearest Neighbors) were evaluated. This paper showed the best performing ML algorithm was the Random Forest Classifier with accuracy = 0.77, F1 Score = 0.78 and MCC = 0.51 (average of 100 sample runs). The study had also explored the effect of applying Synthetic Minority Over-sampling Technique (SMOTE) to the imbalanced dataset, and the overall observation ranged from no significant effect to substantial degradation in performance for some of the models after the SMOTE adjustment.
