Comparative Analysis of Quantum and Classical Support Vector Classifiers for Software Bug Prediction: An Exploratory Study
Md Nadim, Mohammad Hassan, Ashis Kumar Mandal, Chanchal K. Roy, Banani Roy, Kevin A. Schneider
TL;DR
The paper investigates the use of Quantum Support Vector Classifiers (QSVC) and Pegasos QSVC (PQSVC) for detecting buggy software commits, comparing them to classical SVC across 14 open-source projects (30,924 instances). It tackles scalability by chunking data into 500-instance subsets, training multiple chunk models, and aggregating their predictions with a tuned threshold, plus an incremental testing approach to mitigate quantum feature-mapping costs. Key findings show QSVC and PQSVC can be effective in short-data regimes (STAF), but QSVC faces scalability bottlenecks on larger datasets, where aggregation into a Global QSVC provides notable improvements in several projects; PQSVC often underperforms relative to SVC. The work highlights the promise and current limits of quantum machine learning for software defect prediction and offers a reproducible pipeline for further research and development in this domain.
Abstract
Purpose: Quantum computing promises to transform problem-solving across various domains with rapid and practical solutions. Within Software Evolution and Maintenance, Quantum Machine Learning (QML) remains mostly an underexplored domain, particularly in addressing challenges such as detecting buggy software commits from code repositories. Methods: In this study, we investigate the practical application of Quantum Support Vector Classifiers (QSVC) for detecting buggy software commits across 14 open-source software projects with diverse dataset sizes encompassing 30,924 data instances. We compare the QML algorithm PQSVC (Pegasos QSVC) and QSVC against the classical Support Vector Classifier (SVC). Our technique addresses large datasets in QSVC algorithms by dividing them into smaller subsets. We propose and evaluate an aggregation method to combine predictions from these models to detect the entire test dataset. We also introduce an incremental testing methodology to overcome the difficulties of quantum feature mapping during the testing approach. Results: The study shows the effectiveness of QSVC and PQSVC in detecting buggy software commits. The aggregation technique successfully combines predictions from smaller data subsets, enhancing the overall detection accuracy for the entire test dataset. The incremental testing methodology effectively manages the challenges associated with quantum feature mapping during the testing process. Conclusion: We contribute to the advancement of QML algorithms in defect prediction, unveiling the potential for further research in this domain. The specific scenario of the Short-Term Activity Frame (STAF) highlights the early detection of buggy software commits during the initial developmental phases of software systems, particularly when dataset sizes remain insufficient to train machine learning models.
