Influence of Data Dimensionality Reduction Methods on the Effectiveness of Quantum Machine Learning Models
Aakash Ravindra Shinde, Jukka K. Nurminen
TL;DR
The paper analyzes how data dimensionality reduction methods influence the performance of quantum machine learning models, specifically quantum neural networks (QNN) and quantum kernel methods (QSVC). Using generated linear, non-linear, and image-like datasets, the study compares PCA, truncated SVD, autoencoders, and t-SNE as preprocessing steps under fixed qubit constraints. It finds that reduction can skew performance metrics: QNNs often benefit from reduction while QSVC performance tends to decline due to classical SVC components, with results highly dependent on embedding and ansatz choices. The work highlights scalability concerns and calls for quantum approaches less dependent on data reduction, providing open-source code broadly supporting reproducibility and further investigation.
Abstract
Data dimensionality reduction techniques are often utilized in the implementation of Quantum Machine Learning models to address two significant issues: the constraints of NISQ quantum devices, which are characterized by noise and a limited number of qubits, and the challenge of simulating a large number of qubits on classical devices. It also raises concerns over the scalability of these approaches, as dimensionality reduction methods are slow to adapt to large datasets. In this article, we analyze how data reduction methods affect different QML models. We conduct this experiment over several generated datasets, quantum machine algorithms, quantum data encoding methods, and data reduction methods. All these models were evaluated on the performance metrics like accuracy, precision, recall, and F1 score. Our findings have led us to conclude that the usage of data dimensionality reduction methods results in skewed performance metric values, which results in wrongly estimating the actual performance of quantum machine learning models. There are several factors, along with data dimensionality reduction methods, that worsen this problem, such as characteristics of the datasets, classical to quantum information embedding methods, percentage of feature reduction, classical components associated with quantum models, and structure of quantum machine learning models. We consistently observed the difference in the accuracy range of 14% to 48% amongst these models, using data reduction and not using it. Apart from this, our observations have shown that some data reduction methods tend to perform better for some specific data embedding methodologies and ansatz constructions.
