How many qubits does a machine learning problem require?
Sydney Leither, Michael Kubal, Sonika Johri
TL;DR
The paper tackles the fundamental question of how many qubits a learning problem requires in a variational quantum setting. It introduces bit-bit encoding as a universal, efficiently compressive data representation and a concrete resource metric, $Q_{\text{dataset}}(x)$, to estimate qubit needs for target accuracy. Through theory and experiments on MNIST and benchmark datasets, it shows medium-sized classical datasets typically require around $27$ qubits under bit-bit encoding, while larger biological datasets may demand more, especially when batched processing is used. The work provides a principled foundation for benchmarking quantum advantages in machine learning and guides future work toward datasets and encodings where quantum resources could yield a practical edge.
Abstract
For a machine learning paradigm to be generally applicable, it should have the property of universal approximation, that is, it should be able to approximate any target function to any desired degree of accuracy. In variational quantum machine learning, the class of functions that can be learned depend on both the data encoding scheme as well as the architecture of the optimizable part of the model. Here, we show that the property of universal approximation is constructively and efficiently realized by the recently proposed bit-bit data encoding scheme. Further, we show that this construction allows us to calculate the number of qubits required to solve a learning problem on a dataset to a target accuracy, giving rise to the first resource estimation framework for variational quantum machine learning. We apply bit-bit encoding to a number of medium-sized classical benchmark datasets and find that they require only 27 qubits on average for encoding. We extend the basic bit-bit encoding scheme to a variant that efficiently supports batched processing of large datasets. As a demonstration, we apply this new scheme to subsets of a giga-scale transcriptomic dataset. This work establishes bit-bit encoding not only as a universally expressive quantum data representation, but also as a practical foundation for resource estimation and benchmarking in quantum machine learning.
