Table of Contents
Fetching ...

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Sacha Lerch, Joseph Bowles, Ricard Puig, Erik Armengol, Zoë Holmes, Supanut Thanasilp

Abstract

Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli-$Z$ correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Abstract

Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli- correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.
Paper Structure (45 sections, 19 theorems, 211 equations, 7 figures)

This paper contains 45 sections, 19 theorems, 211 equations, 7 figures.

Key Result

Theorem 1

Let the parameters of the IQP circuit with all-to-all connectivity in Eq. eq:IQP-circuit be independently drawn from ${\rm Unif}([-\pi/2,\pi/2])$. Then the variance of the MMD loss over the parameter space exponentially concentrates with

Figures (7)

  • Figure 1: Illustration of our main results. Schematic MMD loss landscape $\mathcal{L}(\boldsymbol{\theta})$ for an IQP generative model. The insets show the corresponding computational-basis distributions at representative initialization points: the model distribution $p_{\boldsymbol{\theta}}(\boldsymbol{z} )$ is shown in gray, and the target distribution $p_{\rm data}(\boldsymbol{z} )$ in green. Identity initialization ($\boldsymbol{\theta}^*=\boldsymbol{0}$) yields a highly biased distribution, $p_{\boldsymbol{\theta}^*}(\boldsymbol{0})=1$, which corresponds to a local maximum of the MMD. An unbiased data-agnostic initialization ($\theta_j^{*}=\pi/4$) yields a uniform distribution over bit strings and usually corresponds to a region with gradient. A data-dependent initialization yields an initial distribution that is better aligned with the target. The green point indicates schematically the minimum corresponding to $p_{\rm data}(\boldsymbol{z} )$.
  • Figure 2: Variance of the low-body MMD estimator versus initialization scale. Top row: Variance of the MMD estimator as a function of the initialization scale for four initialization schemes (identity, unbiased, data-dependent, and covariance) using a kernel with bandwidth that scales as $\sigma\in\Theta(\sqrt{n})$. Curves correspond to different numbers of qubits $n$ ranging from $n=6$ to $n=16$. For the first three schemes, parameters are initialized as $\theta_j\sim\theta_j^*+{\rm Unif}[-\frac{\pi}{2}s,\frac{\pi}{2}s]$ where $s$ is the initialization scale. The covariance initialization (right column) follows Eq. \ref{['eq:new_data_dependent_init_covariances']}, where two-qubit gate parameters are correlated and perturbations are rescaled according to the target distribution covariances. The target distribution is given by a genomic dataset. Bottom row: Maximum variance over initialization scales (blue) and the initialization scale achieving this maximum (red dashed) as functions of the number of qubits $n$.
  • Figure 3: Training of the low-body MMD estimator for different initializations. Training loss $\mathcal{L}(\boldsymbol{\theta})$ as a function of gradient-descent iterations for a $n=150$ qubit model trained on the genomic dataset. Each curve corresponds to one of the four initialization strategies: identity (red), unbiased (blue), marginal matching (green), and the covariance-based data-dependent initialization (light green). The three panels correspond to different initialization scales $s$ (as defined in Fig. \ref{['fig:mmd_var_vs_patch-size']}): $s=1/m$, $s=1/\sqrt{m}$, and $s=1$, where $m$ denotes the total number of parameters. Each model is trained five times, with most initializations leading to similar overlapping loss curves except for the covariance-based initialization for $s\in\Theta(1)$. For the linear and sqrt scalings, identity initialization shows an initial rapid decrease of the loss but quickly reaches a plateau at substantially larger loss values, leading to poor convergence compared to the other strategies. The unbiased initialization remains trainable for these scalings but converges more slowly than the data-dependent approaches. In contrast, under the unit scaling (corresponding to a full angle random initialization) the marginal-matching and unbiased initializations fail to train, whereas the covariance-based initialization can still achieve successful training in some trials, highlighting the benefit of incorporating parameter correlations in the initialization.
  • Figure 4: Illustration of the effective Pauli light cone. The subset $A$ is shown in green, and its external neighborhood $N_E(A)$ in red. The pink gates represent the interactions (equivalent to the edges of graph $E$), namely the $2$-qubits gates in the IQP circuit. Gray qubits lie outside $A \cup N_E(A)$. In this $5$ qubits example, $A=\{2,3\}$ and $N_E(A)=\{1,4\}$.
  • Figure 5: Representation of a $1$-regular and a $2$-regular graph with different observables such that the maximum and minimum $d_A$ are obtained.
  • ...and 2 more figures

Theorems & Definitions (33)

  • Theorem 1: Exponential concentration of the MMD loss
  • Theorem 2: Lower bound guarantee of an arbitrary non-linear loss with sufficient curvature, informal
  • Theorem 3: Variance guarantee for agnostic initialization strategies, informal
  • Theorem 4: Variance guarantee for data-dependent initialization strategy, informal
  • Lemma 1
  • proof
  • Proposition 1: Exact correlator variance under full-angle random initialization
  • Corollary 1: Architecture-dependent bounds for $K$-regular graphs
  • Corollary 2: All-to-all full-angle correlator concentration
  • Proposition 2: Vanishing correlator cross terms
  • ...and 23 more