Benchmarking data encoding methods in Quantum Machine Learning
Orlane Zang, Grégoire Barrué, Tony Quertier
TL;DR
The paper benchmarks five quantum data-encoding methods for quantum neural networks across malware, WDBC, and MNIST datasets to understand how encoding choices affect performance on NISQ-scale hardware. It systematically compares Simple Angle, π/4 Angle, Entangled Angle, Amplitude, and IQP embeddings using data-reuploading QNNs, with feature counts 4, 6, or 8 and layer depths 2 or 4. Findings indicate Amplitude encoding excels on large datasets with suitable feature counts (often 8), while angle-based encodings better handle smaller datasets; π/4 and Entangled Angle often perform well with moderate features and deeper models, and Simple Angle is generally outperformed but can shine with very few data points. These results inform practical encoding selection depending on data size, feature availability, and the desired quantum resource budget, contributing to optimized QML pipelines on current quantum hardware. Future work suggests testing additional QNN architectures and gate sets to generalize the encoding guidelines further.
Abstract
Data encoding plays a fundamental and distinctive role in Quantum Machine Learning (QML). While classical approaches process data directly as vectors, QML may require transforming classical data into quantum states through encoding circuits, known as quantum feature maps or quantum embeddings. This step leverages the inherently high-dimensional and non-linear nature of Hilbert space, enabling more efficient data separation in complex feature spaces that may be inaccessible to classical methods. This encoding part significantly affects the performance of the QML model, so it is important to choose the right encoding method for the dataset to be encoded. However, this choice is generally arbitrary, since there is no "universal" rule for knowing which encoding to choose based on a specific set of data. There are currently a variety of encoding methods using different quantum logic gates. We studied the most commonly used types of encoding methods and benchmarked them using different datasets.
