Table of Contents
Fetching ...

Encoding Numerical Data for Generative Quantum Machine Learning

Michael Krebsbach, Florentin Reiter, Thomas Wellens, Hagen-Henrik Kowalski, Ali Abedi

Abstract

Generative quantum machine learning models are trained to deduce the probability distribution underlying a given dataset, and to produce new, synthetic samples from it. The majority of such models proposed in the literature, like the Quantum Circuit Born Machine (QCBM), fundamentally work on a binary level. Real-world data, however, is often numeric, requiring the models to translate between binary and continuous representations. We analyze how this transition influences the performance of quantum models and show that it requires the models to learn correlations that are solely an artifact of the way the data is encoded, and not related to the data itself. At the same time, structure of the original data can be obscured in the binary representation, hindering generalization. To mitigate these effects, we propose a strategy based on Gray-codes that can be implemented with essentially no overhead, conserves structures in the data, and avoids artificial correlations in situations in which the standard approach creates them. Considering datasets drawn from various one-dimensional probability distributions, we verify that, in most cases, QCBMs using the reflected Gray code learn faster and more accurately than those with standard binary code.

Encoding Numerical Data for Generative Quantum Machine Learning

Abstract

Generative quantum machine learning models are trained to deduce the probability distribution underlying a given dataset, and to produce new, synthetic samples from it. The majority of such models proposed in the literature, like the Quantum Circuit Born Machine (QCBM), fundamentally work on a binary level. Real-world data, however, is often numeric, requiring the models to translate between binary and continuous representations. We analyze how this transition influences the performance of quantum models and show that it requires the models to learn correlations that are solely an artifact of the way the data is encoded, and not related to the data itself. At the same time, structure of the original data can be obscured in the binary representation, hindering generalization. To mitigate these effects, we propose a strategy based on Gray-codes that can be implemented with essentially no overhead, conserves structures in the data, and avoids artificial correlations in situations in which the standard approach creates them. Considering datasets drawn from various one-dimensional probability distributions, we verify that, in most cases, QCBMs using the reflected Gray code learn faster and more accurately than those with standard binary code.
Paper Structure (20 sections, 33 equations, 5 figures, 1 table)

This paper contains 20 sections, 33 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Binary tree representation of the standard (left) and binary reflected code (right) with $n=4$ bits. The bits are ordered from left ($b_0$, most significant) to right ($b_3$, least significant). A green line corresponds to $0$ and an orange line to $1$.
  • Figure 2: Schematic drawing of the circuit Ansatz. It includes in total $n$ qubits initialized in the state $|0\rangle$ and the dashed box is repeated $L$ times. $Y_\theta$ represents a $R_Y$-rotation by the angle $\theta$ and each gate has its own parameter $\theta$.
  • Figure 3: Training curves of various QCBMs with $n=8$ qubits and from left to right the random (gray), standard (blue), reflected Gray (orange) and monotone Gray code (green) trained on 256 samples drawn from a single Gaussian distribution with mean 0 and standard deviation $\nu = 0.03$. Top: History of the $\mathrm{MMD}^2$ loss function during training. Each line represents the average of ten independent runs and the colored interval the standard error of the mean. For each binary code, circuits of various depths $L = 0, \dots, 6$ have been trained (lightness of the line). The markers at the end of each plot show the geometric mean of the loss $Q_{\mathrm{MMD}^2}$ over all epochs, indicating the model quality as it combines the convergence speed and achieved minimum in a single number. The gray dashed lines indicate the targeted reference value calculated by sampling and discretizing 256 additional samples from the target distribution, and comparing them to the training dataset. Bottom: The training (black line) and synthetic data (bars) of the models with $L=6$ after 100 epochs. The histograms show the aggregate of the ten independent runs, both in the training and the synthetic data, resulting in a total of $256 \times 10 = 2560$ samples. This aggregation indicates which patterns are exhibited consistently across all ten independent runs. Note, that the $x$-Axis is zoomed in on the relevant part of the data space. Each bin corresponds to one of the $2^8$ representatives $x_j$ and shows precisely $\mathcal{D}_j$.
  • Figure 4: Various QCBMs trained on datasets drawn from three randomly placed Gaussian distributions with standard deviation $\nu = 0.03$. Top: The quality score $Q_{\mathrm{MMD}^2}$, see eq. \ref{['eq:quality']}, computed for the first 100 epochs for various models with different number of qubits $n=6, \dots, 16$ (x-axis), different number of layers $L=0, \dots, 6$ (marker and lightness, see legend on the right-hand side) and the four binary codes $f_{\mathrm{RC}}$, $f_{\mathrm{SC}}$, $f_{\mathrm{RGC}}$ and $f_{\mathrm{MGC}}$ (from left to right). Each point is the mean of 10 models trained on independently drawn datasets. For an easier comparison, the markers for $f_{\mathrm{SC}}$ are also drawn into the $f_{\mathrm{RGC}}$ plot and vice-versa as the small markers with dashed connection. Bottom: 256 training (black line) and 256 synthetic datapoints (bars) for one model with $n=12$ qubits and $L=6$ layers trained for 100 epochs. For visibility, the data space $\mathcal{D}$ is discretized into $2^6$ (and not $2^{12}$) bins.
  • Figure 5: The simulation results for various QCBMs with 12 qubits, averaged over ten datasets drawn from different sawtooth distributions with randomly chosen means $\mu_i$. Top: The $x$-axis scales the width $\nu$ of the sawtooths and the $y$-axis shows the $Q_{\mathrm{MMD}^2}$ scores achieved by the models. Models with binary codes $f_{\mathrm{RC}}$ (black), $f_{\mathrm{SC}}$ (blue), $f_{\mathrm{RGC}}$ (orange), $f_{\mathrm{MGC}}$ (green) are displayed from left to right. Like before, the markers and their lightness indicate the number of layers $L$. Bottom: 256 training (black line) and 256 synthetic datapoints (bars) for one model with 12 qubits, $L=6$ layers and $\nu = 0.1$ trained for 100 epochs. For visibility, the data space $\mathcal{D}$ is discretized into $2^6$ (and not $2^{12}$) bins.