Rare Event Detection in Imbalanced Multi-Class Datasets Using an Optimal MIP-Based Ensemble Weighting Approach
Georgios Tertytchny, Georgios L. Stavrinides, Maria K. Michael
TL;DR
This work tackles rare event detection in imbalanced multi-class CPS by introducing an optimal mixed-integer programming (MIP) ensemble weighting scheme that assigns per-class weights to classifiers and automatically selects a fixed number $K$ of classifiers. Elastic net regularization is integrated into the objective to enhance generalization and robustness, yielding sparse and stable weight distributions. Across four diverse CPS datasets, the proposed method consistently surpasses six established weighting schemes on balanced accuracy and macro-averaged metrics, with notable gains as ensemble size grows, while maintaining computational efficiency. The approach offers a scalable, high-performance solution for robust rare-event detection in resource-constrained CPS settings.
Abstract
To address the challenges of imbalanced multi-class datasets typically used for rare event detection in critical cyber-physical systems, we propose an optimal, efficient, and adaptable mixed integer programming (MIP) ensemble weighting scheme. Our approach leverages the diverse capabilities of the classifier ensemble on a granular per class basis, while optimizing the weights of classifier-class pairs using elastic net regularization for improved robustness and generalization. Additionally, it seamlessly and optimally selects a predefined number of classifiers from a given set. We evaluate and compare our MIP-based method against six well-established weighting schemes, using representative datasets and suitable metrics, under various ensemble sizes. The experimental results reveal that MIP outperforms all existing approaches, achieving an improvement in balanced accuracy ranging from 0.99% to 7.31%, with an overall average of 4.53% across all datasets and ensemble sizes. Furthermore, it attains an overall average increase of 4.63%, 4.60%, and 4.61% in macro-averaged precision, recall, and F1-score, respectively, while maintaining computational efficiency.
