Identifying Flaky Tests in Quantum Code: A Machine Learning Approach
Khushdeep Kaur, Dongchan Kim, Ainaz Jamshidi, Lei Zhang
TL;DR
This work tackles the problem of flaky tests in quantum software by building a feature-based ML platform that detects quantum flaky tests from Python code. It expands a quantum flakiness dataset and evaluates five classifiers (XGBoost, Decision Tree, Random Forest, KNN, SVM) under balanced and imbalanced conditions, using SMOTE and threshold tuning to address class imbalance. The results show that tree-based models, especially XGBoost and DT, deliver the strongest performance, with dataset expansion and balancing techniques enhancing reliability. The study lays groundwork for future unsupervised approaches and CI integration to improve the reliability and robustness of quantum software testing.
Abstract
Testing and debugging quantum software pose significant challenges due to the inherent complexities of quantum mechanics, such as superposition and entanglement. One challenge is indeterminacy, a fundamental characteristic of quantum systems, which increases the likelihood of flaky tests in quantum programs. To the best of our knowledge, there is a lack of comprehensive studies on quantum flakiness in the existing literature. In this paper, we present a novel machine learning platform that leverages multiple machine learning models to automatically detect flaky tests in quantum programs. Our evaluation shows that the extreme gradient boosting and decision tree-based models outperform other models (i.e., random forest, k-nearest neighbors, and support vector machine), achieving the highest F1 score and Matthews Correlation Coefficient in a balanced dataset and an imbalanced dataset, respectively. Furthermore, we expand the currently limited dataset for researchers interested in quantum flaky tests. In the future, we plan to explore the development of unsupervised learning techniques to detect and classify quantum flaky tests more effectively. These advancements aim to improve the reliability and robustness of quantum software testing.
