Quantum SMOTE with Angular Outliers: Redefining Minority Class Handling
Nishikanta Mohanty, Bikash K. Behera, Christopher Ferrie
TL;DR
This work introduces Quantum-SMOTEV2, a centroid-based quantum-enhanced method for mitigating class imbalance by leveraging angular distributions and Angular Outliers (AOL). It replaces clustering with a single-centroid approach and uses compact swap tests with low-depth quantum circuits to generate synthetic minority samples, followed by targeted outlier boosting. Across the telecom churn dataset, RF, KNN, and NN classifiers show substantial gains in accuracy, ROC AUC, and precision-recall metrics at moderate SMOTE levels (30–36%), previously requiring higher oversampling. The method preserves the core hyperparameters of prior Quantum-SMOTE variants and demonstrates scalability, making it a practical enhancement for edge-case classification in imbalanced data contexts. Overall, Quantum-SMOTEV2 with AOL provides a robust, quantum-assisted pathway to improve minority-class predictions while maintaining computational efficiency.
Abstract
This paper introduces Quantum-SMOTEV2, an advanced variant of the Quantum-SMOTE method, leveraging quantum computing to address class imbalance in machine learning datasets without K-Means clustering. Quantum-SMOTEV2 synthesizes data samples using swap tests and quantum rotation centered around a single data centroid, concentrating on the angular distribution of minority data points and the concept of angular outliers (AOL). Experimental results show significant enhancements in model performance metrics at moderate SMOTE levels (30-36%), which previously required up to 50% with the original method. Quantum-SMOTEV2 maintains essential features of its predecessor (arXiv:2402.17398), such as rotation angle, minority percentage, and splitting factor, allowing for tailored adaptation to specific dataset needs. The method is scalable, utilizing compact swap tests and low depth quantum circuits to accommodate a large number of features. Evaluation on the public Cell-to-Cell Telecom dataset with Random Forest (RF), K-Nearest Neighbours (KNN) Classifier, and Neural Network (NN) illustrates that integrating Angular Outliers modestly boosts classification metrics like accuracy, F1 Score, AUC-ROC, and AUC-PR across different proportions of synthetic data, highlighting the effectiveness of Quantum-SMOTEV2 in enhancing model performance for edge cases.
