Table of Contents
Fetching ...

Benchmarking data encoding methods in Quantum Machine Learning

Orlane Zang, Grégoire Barrué, Tony Quertier

TL;DR

The paper benchmarks five quantum data-encoding methods for quantum neural networks across malware, WDBC, and MNIST datasets to understand how encoding choices affect performance on NISQ-scale hardware. It systematically compares Simple Angle, π/4 Angle, Entangled Angle, Amplitude, and IQP embeddings using data-reuploading QNNs, with feature counts 4, 6, or 8 and layer depths 2 or 4. Findings indicate Amplitude encoding excels on large datasets with suitable feature counts (often 8), while angle-based encodings better handle smaller datasets; π/4 and Entangled Angle often perform well with moderate features and deeper models, and Simple Angle is generally outperformed but can shine with very few data points. These results inform practical encoding selection depending on data size, feature availability, and the desired quantum resource budget, contributing to optimized QML pipelines on current quantum hardware. Future work suggests testing additional QNN architectures and gate sets to generalize the encoding guidelines further.

Abstract

Data encoding plays a fundamental and distinctive role in Quantum Machine Learning (QML). While classical approaches process data directly as vectors, QML may require transforming classical data into quantum states through encoding circuits, known as quantum feature maps or quantum embeddings. This step leverages the inherently high-dimensional and non-linear nature of Hilbert space, enabling more efficient data separation in complex feature spaces that may be inaccessible to classical methods. This encoding part significantly affects the performance of the QML model, so it is important to choose the right encoding method for the dataset to be encoded. However, this choice is generally arbitrary, since there is no "universal" rule for knowing which encoding to choose based on a specific set of data. There are currently a variety of encoding methods using different quantum logic gates. We studied the most commonly used types of encoding methods and benchmarked them using different datasets.

Benchmarking data encoding methods in Quantum Machine Learning

TL;DR

The paper benchmarks five quantum data-encoding methods for quantum neural networks across malware, WDBC, and MNIST datasets to understand how encoding choices affect performance on NISQ-scale hardware. It systematically compares Simple Angle, π/4 Angle, Entangled Angle, Amplitude, and IQP embeddings using data-reuploading QNNs, with feature counts 4, 6, or 8 and layer depths 2 or 4. Findings indicate Amplitude encoding excels on large datasets with suitable feature counts (often 8), while angle-based encodings better handle smaller datasets; π/4 and Entangled Angle often perform well with moderate features and deeper models, and Simple Angle is generally outperformed but can shine with very few data points. These results inform practical encoding selection depending on data size, feature availability, and the desired quantum resource budget, contributing to optimized QML pipelines on current quantum hardware. Future work suggests testing additional QNN architectures and gate sets to generalize the encoding guidelines further.

Abstract

Data encoding plays a fundamental and distinctive role in Quantum Machine Learning (QML). While classical approaches process data directly as vectors, QML may require transforming classical data into quantum states through encoding circuits, known as quantum feature maps or quantum embeddings. This step leverages the inherently high-dimensional and non-linear nature of Hilbert space, enabling more efficient data separation in complex feature spaces that may be inaccessible to classical methods. This encoding part significantly affects the performance of the QML model, so it is important to choose the right encoding method for the dataset to be encoded. However, this choice is generally arbitrary, since there is no "universal" rule for knowing which encoding to choose based on a specific set of data. There are currently a variety of encoding methods using different quantum logic gates. We studied the most commonly used types of encoding methods and benchmarked them using different datasets.

Paper Structure

This paper contains 14 sections, 17 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: A layer of the QML model used to carry out our simulations and the measurement part. Only the embedding part varies. The classification and measurement parts are the same for all the encodings tested. We use data re-uploading. Thus, for each model formed with a given encoding, one layer of the model contains the encoding part and the classification part.
  • Figure 1.1: Simple Angle encoding (a) and $\frac{\pi}{4}$-Angle encoding (b).
  • Figure 1.2: Entangled Angle encoding.
  • Figure 1.3: Binary tree decomposition of our example.
  • Figure 1.4: Amplitude encoding with $N=8$.
  • ...and 10 more figures