Hybrid Efficient Unsupervised Anomaly Detection for Early Pandemic Case Identification
Ghazal Ghajari, Mithun Kumar PK, Fathi Amsaad
TL;DR
The paper tackles the challenge of identifying early pandemic cases when labeled data are scarce by proposing a hybrid unsupervised anomaly-detection method that fuses distance and density signals. It constructs a complete distance matrix via an unsupervised random forest, builds a KNN graph, and uses graph clustering to derive cluster centers, from which per-point density and distance features are computed to yield an anomaly score with a log-transform threshold. On COVID-19 chest X-ray features, the method achieves a mean AUC of 77.43%, outperforming Isolation Forest and KNN, and demonstrates robustness across varying data sizes without requiring labeled training data. The approach is positioned as broadly applicable across domains, with potential extensions to NLP, weak supervision, parallelization, and diverse anomaly-detection tasks in healthcare, security, finance, and engineering.
Abstract
Unsupervised anomaly detection is a promising technique for identifying unusual patterns in data without the need for labeled training examples. This approach is particularly valuable for early case detection in epidemic management, especially when early-stage data are scarce. This research introduces a novel hybrid method for anomaly detection that combines distance and density measures, enhancing its applicability across various infectious diseases. Our method is especially relevant in pandemic situations, as demonstrated during the COVID-19 crisis, where traditional supervised classification methods fall short due to limited data. The efficacy of our method is evaluated using COVID-19 chest X-ray data, where it significantly outperforms established unsupervised techniques. It achieves an average AUC of 77.43%, surpassing the AUC of Isolation Forest at 73.66% and KNN at 52.93%. These results highlight the potential of our hybrid anomaly detection method to improve early detection capabilities in diverse epidemic scenarios, thereby facilitating more effective and timely responses.
