Hardware-efficient ansatz without barren plateaus in any depth

Chae-Yeun Park; Minhyeok Kang; Joonsuk Huh

Hardware-efficient ansatz without barren plateaus in any depth

Chae-Yeun Park, Minhyeok Kang, Joonsuk Huh

TL;DR

This paper proposes two novel parameter conditions in which the hardware-efficient ansatz (HEA) is free from barren plateaus for arbitrary circuit depths and argues that the HEA in this phase has a large gradient component for a local observable using a phenomenological model for the MBL system.

Abstract

Variational quantum circuits have recently gained much interest due to their relevance in real-world applications, such as combinatorial optimizations, quantum simulations, and modeling a probability distribution. Despite their huge potential, the practical usefulness of those circuits beyond tens of qubits is largely questioned. One of the major problems is the so-called barren plateaus phenomenon. Quantum circuits with a random structure often have a flat cost-function landscape and thus cannot be trained efficiently. In this paper, we propose two novel parameter conditions in which the hardware-efficient ansatz (HEA) is free from barren plateaus for arbitrary circuit depths. In the first condition, the HEA approximates to a time-evolution operator generated by a local Hamiltonian. Utilizing a recent result by [Park and Killoran, Quantum 8, 1239 (2024)], we prove a constant lower bound of gradient magnitudes in any depth both for local and global observables. On the other hand, the HEA is within the many-body localized (MBL) phase in the second parameter condition. We argue that the HEA in this phase has a large gradient component for a local observable using a phenomenological model for the MBL system. By initializing the parameters of the HEA using these conditions, we show that our findings offer better overall performance in solving many-body Hamiltonians. Our results indicate that barren plateaus are not an issue when initial parameters are smartly chosen, and other factors, such as local minima or the expressivity of the circuit, are more crucial.

Hardware-efficient ansatz without barren plateaus in any depth

TL;DR

Abstract

Paper Structure (17 sections, 6 theorems, 71 equations, 6 figures)

This paper contains 17 sections, 6 theorems, 71 equations, 6 figures.

Parameter constraint for lower bounding the gradient magnitudes by a constant
Constant gradient magnitudes for the Hamiltonian variational ansatz
Converting the hardware efficient ansatz to a circuit with parameterized entangling gates
Floquet many-body localization in the hardware-efficient ansatz
Brief introduction to many-body localization
Many-body localized hardware-efficient ansatz
Product of Floquet-MBL systems is an MBL system
The MBL phase of the hardware efficient ansatz with mutually commuting entangling gates
Derivation of the gradient scaling in the MBL system
Deriving the expressions of gradients for a single Pauli-Y and a multi-body observable
Long-time limit of gradients
Numerical results for the 2D hardware efficient ansatz and Gaussian initialization
Machine learning application
Dataset
Encoding
...and 2 more sections

Key Result

Theorem 1

Let $C(\pmb{\theta}) = \braket{\psi(\pmb{\theta})|O|\psi(\pmb{\theta})}$ be the cost function where $O$ is either a Pauli string or $k$-local Hamiltonian. Suppose that there exist $n,m$ such that $|\partial_{n,m} C |_{\pmb{\theta}=0} = \Omega(1)$. Then, there exists a constant $\gamma > 0$ such that

Figures (6)

Figure 1: Circuit identity used for removing CZ gates from the HEA. Using the property that the CZ gate is a Clifford gate, we can move CZ gates in each block to the beginning of the block.
Figure 2: Averaged squared gradients as functions of $N$ for $p \in [32, 64, 128]$. Observables (a) $O=Y_1$ and (b) $O=Y_1 \prod_{j =2}^N Z_j$ are used. Each data point presents the averaged gradient components for the RX gate acting on the first qubit, $\sum_{i=1}^p (\partial_{i,0}C)^2/p$. For each parameter initialization scheme, results are averaged over $2^{10}$ randomly sampled parameters. For the Small initialization, the gradient magnitudes do not decay with $N$ regardless of the observable. On the other hand, the MBL initialization shows $\Theta(1)$ gradient magnitudes when a local observable is used, whereas they decay exponentially for a global observable.
Figure 3: Normalized energies $\widetilde{E} = (\braket{H_{1,2}} - E_{\rm GS})/|E_{\rm GS}|$ as functions of optimization steps for (a) the Heisenberg model ($H_1$) and (b) the cluster model ($H_2$) with external fields. The HEA with $N=20$ and $p=256$ is used. We optimize the parameters using Adam kingma2014adam with learning rates (a) $\eta = 0.005$ and (b) $\eta = 0.001$, which are chosen from hyperparameter optimizations. For each initialization scheme, we run $16$ independent VQE instances. Solid curves show the averaged values for each step, while the shaded regions indicate the range between the worst and best-performing instances.
Figure B.1: Many-body localization of a unitary operator $\tilde{V}(\theta)$. (a) Half-chain entanglement entropy for eigenstates of $\tilde{V}(\theta)$ as a function of $\theta/\pi$. Results are averaged over all eigenstates and disorder realizations. Dashed horizontal lines indicate the Page entropy, which is expected for Haar random states. (b) Variance of the eigenstate entanglement entropy averaged over disorder realizations. For each random instance of $\tilde{V}(\theta)$, we compute $\overline{S_E^2} - \overline{S_E}^2$, and the results are averaged over all instances. (c) The averaged adjacent gap ratios. For ordered quasi-energy levels $\{E_i\}$ for each random instance of $\tilde{V}(\theta)$, gaps $\Delta_i = E_{i+1}-E_i$ are obtained. Then, the ratios $r_i=\min\{\Delta_{i+1}/\Delta_i,\Delta_{i}/\Delta_{i+1}\}$ are averaged over $i$ and all random instances. Horizontal lines indicate the expected averaged values of $r$ for the Possion (dashed) and the Gaussian orthogonal ensemble (dotted). All presented results are obtained from $2^{12}$ random instances for $N \in [8, 10]$, $2^{10}$ for $N=12$, and $2^7$ for $N=14$.
Figure C.2: Scaling of gradients for the 1D (left column) and 2D HEAs (right colume) with observables $O=Y_1$ (first row) and $O=Y_1\prod_{j=2}^N Z_j$ (second row). The number of blocks $p\in [32,64,128]$, $p \in [16, 32,64]$ are used for the 1D and 2D HEA, respectively. The weight of the observable is given by $S=1$ for $O=Y_1$ and $S=N$ for $O=Y_1\prod_{j=2}^N Z_j$.
...and 1 more figures

Theorems & Definitions (11)

Theorem 1
Lemma A.1
proof
Lemma A.2: Proposition 3 in Ref. park2024hamiltonian
Theorem A.1
proof
Lemma A.3
proof
Theorem A.2: Restatement of Theorem 1 in the main text
Remark
...and 1 more

Hardware-efficient ansatz without barren plateaus in any depth

TL;DR

Abstract

Hardware-efficient ansatz without barren plateaus in any depth

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (11)