Table of Contents
Fetching ...

Scaling Quantum Machine Learning without Tricks: High-Resolution and Diverse Image Generation

Jonas Jäger, Florian J. Kiwit, Carlos A. Riofrío

TL;DR

This work generates full-resolution images across all ten classes and establishes a new state-of-the-art performance with a single end-to-end quantum generator without tricks, and analyzes how the choice of variational circuit architecture introduces inductive biases, which crucially unlock this performance.

Abstract

Quantum generative modeling is a rapidly evolving discipline at the intersection of quantum computing and machine learning. Contemporary quantum machine learning is generally limited to toy examples or heavily restricted datasets with few elements. This is not only due to the current limitations of available quantum hardware but also due to the absence of inductive biases arising from application-agnostic designs. Current quantum solutions must resort to tricks to scale down high-resolution images, such as relying heavily on dimensionality reduction or utilizing multiple quantum models for low-resolution image patches. Building on recent developments in classical image loading to quantum computers, we circumvent these limitations and train quantum Wasserstein GANs on the established classical MNIST and Fashion-MNIST datasets. Using the complete datasets, our system generates full-resolution images across all ten classes and establishes a new state-of-the-art performance with a single end-to-end quantum generator without tricks. As a proof-of-principle, we also demonstrate that our approach can be extended to color images, exemplified on the Street View House Numbers dataset. We analyze how the choice of variational circuit architecture introduces inductive biases, which crucially unlock this performance. Furthermore, enhanced noise input techniques enable highly diverse image generation while maintaining quality. Finally, we show promising results even under quantum shot noise conditions.

Scaling Quantum Machine Learning without Tricks: High-Resolution and Diverse Image Generation

TL;DR

This work generates full-resolution images across all ten classes and establishes a new state-of-the-art performance with a single end-to-end quantum generator without tricks, and analyzes how the choice of variational circuit architecture introduces inductive biases, which crucially unlock this performance.

Abstract

Quantum generative modeling is a rapidly evolving discipline at the intersection of quantum computing and machine learning. Contemporary quantum machine learning is generally limited to toy examples or heavily restricted datasets with few elements. This is not only due to the current limitations of available quantum hardware but also due to the absence of inductive biases arising from application-agnostic designs. Current quantum solutions must resort to tricks to scale down high-resolution images, such as relying heavily on dimensionality reduction or utilizing multiple quantum models for low-resolution image patches. Building on recent developments in classical image loading to quantum computers, we circumvent these limitations and train quantum Wasserstein GANs on the established classical MNIST and Fashion-MNIST datasets. Using the complete datasets, our system generates full-resolution images across all ten classes and establishes a new state-of-the-art performance with a single end-to-end quantum generator without tricks. As a proof-of-principle, we also demonstrate that our approach can be extended to color images, exemplified on the Street View House Numbers dataset. We analyze how the choice of variational circuit architecture introduces inductive biases, which crucially unlock this performance. Furthermore, enhanced noise input techniques enable highly diverse image generation while maintaining quality. Finally, we show promising results even under quantum shot noise conditions.
Paper Structure (39 sections, 14 equations, 16 figures, 1 table)

This paper contains 39 sections, 14 equations, 16 figures, 1 table.

Figures (16)

  • Figure 1: Overview of the proposed QGAN generator and training workflow for a $4 \times 4$-pixel grayscale image. (1) Noise Sampling: a multimodal latent distribution is formed by uniformly sampling a discrete mode index $m \in \{1,2\}$ and drawing Gaussian noise $\varepsilon_a \sim \mathcal{N}(0,1)$. The learnable affine transformation $z_{m,\ell,a} = \mu_{m,\ell,a} + \sigma_{m,\ell,a}\,\varepsilon_a$ produces tuned noise inputs. (2) Quantum Generator: the generator circuit begins with Hadamard gates preparing an equal superposition (gray image). Each layer consists of noise uploading via parametrized $R_x$ rotations, (entanglement across address qubits using alternating nearest-neighbor (N2) and next-nearest-neighbor (N3) two-qubit gates, and (controlled $R_y$ rotations on the color qubit to encode pixel intensities. The decompositions of the N2 and N3 gates into $R_y$ rotations and CNOTs are shown below. (3) Discriminator: the quantum state is decoded into an image and passed to a classical CNN critic $D$, whose scalar Wasserstein score provides gradients for training both generator and discriminator.
  • Figure 2: Illustration of multimodal noise modeling (left to right). Quantum circuit perspective of implementing a bimodal mixture distribution via controlled rotations sampling the classical bit $m$ uniformly and $\varepsilon$ normally (unimodal). $z_0$ and $z_1$ denote the tuned noise (shifted by $0$ and $\pi$, respectively). In this single-pixel example, noise is injected directly into the color qubit (no address qubits or layering), so layer and qubit indices $l, a$ as in Eq. (\ref{['eq:rotation_gates_noise']}) are omitted. The noise separates the prepared states around $\ket{0}$ and $\ket{1}$ in the Bloch sphere. Measurements yield pixel values via the probability of $\ket{1}$, consistent with FRQI states in Eq. (\ref{['eq:frqi_state']}). As an example, the distribution resembles the bimodal statistics of the MNIST center pixel for handwritten digits 0 and 1, with peaks at 0 (black) and 1 (white) and vanishing probability in between, avoiding unrealistic gray pixels.
  • Figure 3: QGAN samples for (a) MNIST, (b) Fashion-MNIST, and (c) SVHN. For (a) and (b), one image is shown for each of the 40.0 noise modes used by the large QGANs (64.0 layers). For each mode, the displayed image is selected as closest to the mean of 500.0 samples in Euclidean distance. For (c), a 32.0-layer QGAN generates images restricted to containing the digit 0. The central digit is consistently a 0, while extra digits may occur on the sides, reflecting typical house number tags.
  • Figure 4: Ablation study highlighting the importance of task-specific model design choices. Panels (a) and (b) show images from the task-agnostic circuit using Amplitude encoding and FRQI encoding, respectively. Panels (c) and (d) show images from the task-specific circuit using Amplitude encoding and FRQI encoding, respectively. Task-specific modifications yield clearer, less distorted digit representations, with combining both proposed design choices leading to the best results (d).
  • Figure 5: Comparison of noise inputs: (a) unimodal, (b) fixed multimodal, (c) tuned multimodal. Models were trained on MNIST classes 0--2, with 3.0 modes in the multimodal setups. Images are generated after 15000.0 training iterations and manually selected to highlight characteristic effects.
  • ...and 11 more figures