Table of Contents
Fetching ...

Avoiding barren plateaus via Gaussian Mixture Model

Xiao Shi, Yun Shang

TL;DR

This work tackles the barren plateau problem in variational quantum algorithms by introducing a Gaussian Mixture Model (GMM) initialization for parameter vectors in hardware-efficient PQCs. The authors prove, for single-term, multi-term, and general cost functions, that a GMM-based initialization yields a gradient norm lower bound that is independent of the number of qubits $N$ and scales with circuit depth $L$, with concrete bounds such as $\mathbb{E}\| abla f\|^2 \ge \frac{1}{4}-\frac{1}{8L}$ and extensions that include cross-terms for multi-term observables. They provide extensive numerical evidence on local (e.g., 1D TFIM) and global cost functions, as well as quantum-chemistry simulations (LiH with JW mapping), demonstrating robust training performance, improved gradient magnitudes, and faster convergence under noise. The results suggest that GMM initialization can enable training of larger and deeper PQCs on NISQ devices, with practical guidance for choosing distributions and variances. Overall, the paper offers both rigorous theoretical guarantees and practical validation that Gaussian Mixture Model initialization mitigates BP across a broad class of VQAs.

Abstract

Variational quantum algorithms is one of the most representative algorithms in quantum computing, which has a wide range of applications in quantum machine learning, quantum simulation and other related fields. However, they face challenges associated with the barren plateau phenomenon, especially when dealing with large numbers of qubits, deep circuit layers, or global cost functions, making them often untrainable. In this paper, we propose a novel parameter initialization strategy based on Gaussian Mixture Models. We rigorously prove that, the proposed initialization method consistently avoids the barren plateaus problem for hardware-efficient ansatz with arbitrary length and qubits and any given cost function. Specifically, we find that the gradient norm lower bound provided by the proposed method is independent of the number of qubits $N$ and increases with the circuit depth $L$. Our results strictly highlight the significance of Gaussian Mixture model initialization strategies in determining the trainability of quantum circuits, which provides valuable guidance for future theoretical investigations and practical applications.

Avoiding barren plateaus via Gaussian Mixture Model

TL;DR

This work tackles the barren plateau problem in variational quantum algorithms by introducing a Gaussian Mixture Model (GMM) initialization for parameter vectors in hardware-efficient PQCs. The authors prove, for single-term, multi-term, and general cost functions, that a GMM-based initialization yields a gradient norm lower bound that is independent of the number of qubits and scales with circuit depth , with concrete bounds such as and extensions that include cross-terms for multi-term observables. They provide extensive numerical evidence on local (e.g., 1D TFIM) and global cost functions, as well as quantum-chemistry simulations (LiH with JW mapping), demonstrating robust training performance, improved gradient magnitudes, and faster convergence under noise. The results suggest that GMM initialization can enable training of larger and deeper PQCs on NISQ devices, with practical guidance for choosing distributions and variances. Overall, the paper offers both rigorous theoretical guarantees and practical validation that Gaussian Mixture Model initialization mitigates BP across a broad class of VQAs.

Abstract

Variational quantum algorithms is one of the most representative algorithms in quantum computing, which has a wide range of applications in quantum machine learning, quantum simulation and other related fields. However, they face challenges associated with the barren plateau phenomenon, especially when dealing with large numbers of qubits, deep circuit layers, or global cost functions, making them often untrainable. In this paper, we propose a novel parameter initialization strategy based on Gaussian Mixture Models. We rigorously prove that, the proposed initialization method consistently avoids the barren plateaus problem for hardware-efficient ansatz with arbitrary length and qubits and any given cost function. Specifically, we find that the gradient norm lower bound provided by the proposed method is independent of the number of qubits and increases with the circuit depth . Our results strictly highlight the significance of Gaussian Mixture model initialization strategies in determining the trainability of quantum circuits, which provides valuable guidance for future theoretical investigations and practical applications.
Paper Structure (9 sections, 9 theorems, 78 equations, 13 figures, 6 tables)

This paper contains 9 sections, 9 theorems, 78 equations, 13 figures, 6 tables.

Key Result

Theorem 1

Consider a VQAs problem defined above, assuming that the parameters $\theta$ in the last block defined in Table tab:tab1, and the parameters $\theta$ of the remaining blocks obey the distribution $\mathcal{G}_1(\sigma^2)$, where $\sigma^2=\frac{1}{2LS}$. Then $\forall q\in\{1,...2L\},n\in\{1,...N\}$

Figures (13)

  • Figure 1: (a) The fundamental framework of the variational quantum circuit, comprising L blocks. Each block begins with the introduction of entanglement through $CZ_l$ gates, followed by the application of $R_x$ and $R_y$ gates on each qubit. The structure of $CZ_l$ is depicted in (b).
  • Figure 2: In the training process of the 1D Transverse Field Ising Model, the cost function and gradient norm undergo transformations. Since it is a local cost function, the majority of initialization methods converge to its minimum value.
  • Figure 3: In the training process, when the observable is entirely composed of $X$, the cost function and gradient norm undergo transformations. The gradients for Gaussian, uniform, and reduced-domain distributions remain near zero, resulting in almost non-decreasing cost functions for these distributions. In contrast, our method maintains relatively large gradients throughout the training process and is able to descend to the final results.
  • Figure S1: When a term in the observable is $Y$, the parameters in the last block's $R_y(\theta)$ in the ansatz do not contribute to the training. Moreover, when the entire observable consists of $Y$, the $\theta$ parameters in the $R_y$ gates of the last block have no impact on the cost function.
  • Figure S2: In the scenario where the density matrix $\rho$ remains invariant, the Pauli matrix $XZ$ undergoes a transformation resulting in two components. One component corresponds to $\alpha XZ$, while the other corresponds to $-\beta ZX$.
  • ...and 8 more figures

Theorems & Definitions (11)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 1 more