IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Sacha Lerch; Joseph Bowles; Ricard Puig; Erik Armengol; Zoë Holmes; Supanut Thanasilp

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Sacha Lerch, Joseph Bowles, Ricard Puig, Erik Armengol, Zoë Holmes, Supanut Thanasilp

Abstract

Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli-$Z$ correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Abstract

correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

Paper Structure (45 sections, 19 theorems, 211 equations, 7 figures)

This paper contains 45 sections, 19 theorems, 211 equations, 7 figures.

Introduction
Framework
Quantum generative modeling
Maximum Mean Discrepancy loss
Exponential concentration
Initialization strategies
Results
Overview and practical guidelines
Barren plateaus under the full-angle random initialization
Evading barren plateaus with small-angle initializations
Data-agnostic initialization strategies
Data-dependent initialization strategy: variance guarantee and superior empirical performance
Numerical studies
Discussion
Acknowledgments
...and 30 more sections

Key Result

Theorem 1

Let the parameters of the IQP circuit with all-to-all connectivity in Eq. eq:IQP-circuit be independently drawn from ${\rm Unif}([-\pi/2,\pi/2])$. Then the variance of the MMD loss over the parameter space exponentially concentrates with

Figures (7)

Figure 1: Illustration of our main results. Schematic MMD loss landscape $\mathcal{L}(\boldsymbol{\theta})$ for an IQP generative model. The insets show the corresponding computational-basis distributions at representative initialization points: the model distribution $p_{\boldsymbol{\theta}}(\boldsymbol{z} )$ is shown in gray, and the target distribution $p_{\rm data}(\boldsymbol{z} )$ in green. Identity initialization ($\boldsymbol{\theta}^*=\boldsymbol{0}$) yields a highly biased distribution, $p_{\boldsymbol{\theta}^*}(\boldsymbol{0})=1$, which corresponds to a local maximum of the MMD. An unbiased data-agnostic initialization ($\theta_j^{*}=\pi/4$) yields a uniform distribution over bit strings and usually corresponds to a region with gradient. A data-dependent initialization yields an initial distribution that is better aligned with the target. The green point indicates schematically the minimum corresponding to $p_{\rm data}(\boldsymbol{z} )$.
Figure 2: Variance of the low-body MMD estimator versus initialization scale. Top row: Variance of the MMD estimator as a function of the initialization scale for four initialization schemes (identity, unbiased, data-dependent, and covariance) using a kernel with bandwidth that scales as $\sigma\in\Theta(\sqrt{n})$. Curves correspond to different numbers of qubits $n$ ranging from $n=6$ to $n=16$. For the first three schemes, parameters are initialized as $\theta_j\sim\theta_j^*+{\rm Unif}[-\frac{\pi}{2}s,\frac{\pi}{2}s]$ where $s$ is the initialization scale. The covariance initialization (right column) follows Eq. \ref{['eq:new_data_dependent_init_covariances']}, where two-qubit gate parameters are correlated and perturbations are rescaled according to the target distribution covariances. The target distribution is given by a genomic dataset. Bottom row: Maximum variance over initialization scales (blue) and the initialization scale achieving this maximum (red dashed) as functions of the number of qubits $n$.
Figure 3: Training of the low-body MMD estimator for different initializations. Training loss $\mathcal{L}(\boldsymbol{\theta})$ as a function of gradient-descent iterations for a $n=150$ qubit model trained on the genomic dataset. Each curve corresponds to one of the four initialization strategies: identity (red), unbiased (blue), marginal matching (green), and the covariance-based data-dependent initialization (light green). The three panels correspond to different initialization scales $s$ (as defined in Fig. \ref{['fig:mmd_var_vs_patch-size']}): $s=1/m$, $s=1/\sqrt{m}$, and $s=1$, where $m$ denotes the total number of parameters. Each model is trained five times, with most initializations leading to similar overlapping loss curves except for the covariance-based initialization for $s\in\Theta(1)$. For the linear and sqrt scalings, identity initialization shows an initial rapid decrease of the loss but quickly reaches a plateau at substantially larger loss values, leading to poor convergence compared to the other strategies. The unbiased initialization remains trainable for these scalings but converges more slowly than the data-dependent approaches. In contrast, under the unit scaling (corresponding to a full angle random initialization) the marginal-matching and unbiased initializations fail to train, whereas the covariance-based initialization can still achieve successful training in some trials, highlighting the benefit of incorporating parameter correlations in the initialization.
Figure 4: Illustration of the effective Pauli light cone. The subset $A$ is shown in green, and its external neighborhood $N_E(A)$ in red. The pink gates represent the interactions (equivalent to the edges of graph $E$), namely the $2$-qubits gates in the IQP circuit. Gray qubits lie outside $A \cup N_E(A)$. In this $5$ qubits example, $A=\{2,3\}$ and $N_E(A)=\{1,4\}$.
Figure 5: Representation of a $1$-regular and a $2$-regular graph with different observables such that the maximum and minimum $d_A$ are obtained.
...and 2 more figures

Theorems & Definitions (33)

Theorem 1: Exponential concentration of the MMD loss
Theorem 2: Lower bound guarantee of an arbitrary non-linear loss with sufficient curvature, informal
Theorem 3: Variance guarantee for agnostic initialization strategies, informal
Theorem 4: Variance guarantee for data-dependent initialization strategy, informal
Lemma 1
proof
Proposition 1: Exact correlator variance under full-angle random initialization
Corollary 1: Architecture-dependent bounds for $K$-regular graphs
Corollary 2: All-to-all full-angle correlator concentration
Proposition 2: Vanishing correlator cross terms
...and 23 more

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Abstract

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (33)