Table of Contents
Fetching ...

Can Error Mitigation Improve Trainability of Noisy Variational Quantum Algorithms?

Samson Wang, Piotr Czarnik, Andrew Arrasmith, M. Cerezo, Lukasz Cincio, Patrick J. Coles

TL;DR

This work shows that, for a broad class of EM strategies, exponential cost concentration cannot be resolved without committing exponential resources elsewhere, and finds numerical evidence that Clifford Data Regression (CDR) can aid the training process in certain settings where cost concentration is not too severe.

Abstract

Variational Quantum Algorithms (VQAs) are often viewed as the best hope for near-term quantum advantage. However, recent studies have shown that noise can severely limit the trainability of VQAs, e.g., by exponentially flattening the cost landscape and suppressing the magnitudes of cost gradients. Error Mitigation (EM) shows promise in reducing the impact of noise on near-term devices. Thus, it is natural to ask whether EM can improve the trainability of VQAs. In this work, we first show that, for a broad class of EM strategies, exponential cost concentration cannot be resolved without committing exponential resources elsewhere. This class of strategies includes as special cases Zero Noise Extrapolation, Virtual Distillation, Probabilistic Error Cancellation, and Clifford Data Regression. Second, we perform analytical and numerical analysis of these EM protocols, and we find that some of them (e.g., Virtual Distillation) can make it harder to resolve cost function values compared to running no EM at all. As a positive result, we do find numerical evidence that Clifford Data Regression (CDR) can aid the training process in certain settings where cost concentration is not too severe. Our results show that care should be taken in applying EM protocols as they can either worsen or not improve trainability. On the other hand, our positive results for CDR highlight the possibility of engineering error mitigation methods to improve trainability.

Can Error Mitigation Improve Trainability of Noisy Variational Quantum Algorithms?

TL;DR

This work shows that, for a broad class of EM strategies, exponential cost concentration cannot be resolved without committing exponential resources elsewhere, and finds numerical evidence that Clifford Data Regression (CDR) can aid the training process in certain settings where cost concentration is not too severe.

Abstract

Variational Quantum Algorithms (VQAs) are often viewed as the best hope for near-term quantum advantage. However, recent studies have shown that noise can severely limit the trainability of VQAs, e.g., by exponentially flattening the cost landscape and suppressing the magnitudes of cost gradients. Error Mitigation (EM) shows promise in reducing the impact of noise on near-term devices. Thus, it is natural to ask whether EM can improve the trainability of VQAs. In this work, we first show that, for a broad class of EM strategies, exponential cost concentration cannot be resolved without committing exponential resources elsewhere. This class of strategies includes as special cases Zero Noise Extrapolation, Virtual Distillation, Probabilistic Error Cancellation, and Clifford Data Regression. Second, we perform analytical and numerical analysis of these EM protocols, and we find that some of them (e.g., Virtual Distillation) can make it harder to resolve cost function values compared to running no EM at all. As a positive result, we do find numerical evidence that Clifford Data Regression (CDR) can aid the training process in certain settings where cost concentration is not too severe. Our results show that care should be taken in applying EM protocols as they can either worsen or not improve trainability. On the other hand, our positive results for CDR highlight the possibility of engineering error mitigation methods to improve trainability.

Paper Structure

This paper contains 54 sections, 23 theorems, 191 equations, 10 figures.

Key Result

Theorem 1

Consider an error mitigation strategy that, as a step in its protocol, estimates $E_{\sigma,X,M,k}$ as defined in Eq. eq:thm1_estimator. Suppose that $\sigma$ is prepared with a depth $L_{\sigma}$ circuit and experiences local Pauli noise according to Eq. eq:noisystate. Under these conditions, $E_{\ where $\mathbb{1} \in B(\mathcal{H})$ is the $n$-qubit identity operator and with noise parameter

Figures (10)

  • Figure 1: Error mitigation can impair the resolvability of cost function landscapes. (a): A central primitive in training VQAs is the task of comparing two cost function values ($C(\boldsymbol{\theta}_1 )$ and $C(\boldsymbol{\theta}_2 )$) on the cost landscape in parameter space. Ideally (with infinite sampling), these cost values correspond to the mean values of some probability distributions (left panel). However, in an experimental setup, one only has a finite shot budget and by collecting measurement statistics one obtains an estimate of the mean values by sampling from these distributions (right panel). (b): The effect of certain types of noise models is to concentrate cost function values. This impedes trainability as any two cost function values ($\widetilde{C}(\boldsymbol{\theta}_1 )$ and $\widetilde{C}(\boldsymbol{\theta}_2 )$) have small separation and require many shots to accurately distinguish. (c): Error mitigation can mitigate many effects of noise and potentially recover key features of the noise-free cost function. In an ideal scenario, the separation of the mitigated cost values ($C_m(\boldsymbol{\theta}_1 )$ and $C_m(\boldsymbol{\theta}_2 )$) closely resembles that of the noise-free landscape. However, the caveat is that the variance of statistical outcomes can increase greatly. The effect of this is that the two cost function points can often require even more shots to resolve accurately, compared to the unmitigated case.
  • Figure 2: Schematic of different effects due to noise on cost landscapes. We present a 1-dimensional slice of a simplified cost landscape corresponding to a single parameter $\theta$. a) Depending on the parameterization strategy, some ansatzes can have degenerate minima. b) Certain types of local Pauli noise can cause the cost landscape to exponentially concentrate on a fixed value. Some can problems display optimal parameter resilience (OPR), where the location of the optimal parameters are invariant under action of the certain noise models. c) Aside from cost concentration, noise can also corrupt the cost landscape by breaking the degeneracy of optimal parameters, and shifting the location of minima.
  • Figure 3: Schematic of resource use in error mitigation. Noise is indicated by the shaded orange region. (a) Cost function values are obtained by taking input state $\rho_{in}$, applying parameterized gates which we denote as a unitary channel $\mathcal{U}(\boldsymbol{\theta})$, and measuring the resulting state $\mathcal{U}(\boldsymbol{\theta})(\rho_{in})$ with observable $O$. (b) Noise can corrupt the gates in the circuit, as well as the state preparation and measurement processes. (c) Error mitigation aims to obtain a good approximation to the noise-free cost $C(\boldsymbol{\theta})$ by employing a number of strategies such as: modifying the gates implemented $\mathcal{U}(\boldsymbol{\theta}) \rightarrow \mathcal{V}(\boldsymbol{\theta})$ or the input state $\rho_{in} \rightarrow \sigma_{in}$, utilizing multiple copies of the quantum circuit, modifying the measurement operator $O \rightarrow X$, and utilizing clean ancillary qubits at the end of the circuit. Many such circuits with different hyperparameters can be run, with their expectation values combined in a post-processing step, in order to construct the final error mitigated cost value $C_m(\boldsymbol{\theta})$. Note that here we have only indicated noise occuring in the initial part of the circuit--this reflects the assumptions of analyses in prior works koczor2020exponentialhuggins2020virtual. As we investigate the limitations of such error mitigation schemes, we keep these assumptions as a “best case” analysis. One feature that distinguishes the approaches to error mitigation studied here from error correction is that error correction allows global access to the larger Hilbert space from the start of the circuit, whereas the error mitigation techniques studied here only allow the possibility for global operations at the end of the circuit.
  • Figure 4: Schematic for basis-averaged relative resolvability. In Definition \ref{['def:av_resolvabilityII']} we consider a broader averaged resolvability measure where the average is taken over a class of noisy states, rather than a cost landscape generated by a particular ansatz. This is constructed from the following game: (1) Prepare a reference noisy state $\rho_{\boldsymbol{\lambda}}$ with spectrum $\boldsymbol{\lambda}$ and conjugate by unitary $U_i$ drawn from a 2-design. (2) Pass the resulting state through the considered error mitigation protocol, and evaluate the resolvability from the fixed point of the noise. (3) Do the same, without error mitigation. (4) Average over the 2-design and compare the averaged resolvabilities.
  • Figure 5: Comparing CDR-mitigated and noisy optimization for $\boldsymbol{5}$-qubit Max-Cut QAOA. We plot the approximation ratio of solutions for noisy (red circles) and CDR-mitigated (blue diamonds) optimization of Max-Cut QAOA for $5$ qubits. Different panels show results for different numbers of QAOA rounds $p$ plotted versus total number of shots $N_{\rm tot}$ spent on the optimization of a MaxCut problem. Here, we compute the approximation ratios using exact $H_{\rm MaxCut}$ energies to benchmark quality of the noisy and CDR-mitigated optimization. The approximation ratio is defined as the ratio of a given solution's energy to the true ground state energy. A higher approximation ratio indicates better solution quality. For each $p$ we average the approximation ratio over $36$ MaxCut graphs chosen randomly from Erdös-Rényi ensemble. The error bars show a standard deviation of the mean computed as a standard deviation of the ratio for a graph sample divided by a square root of the number of graphs. For all $p$ and $N_{\rm tot}$ values we see an advantage of the CDR-mitigated optimization over noisy optimization.
  • ...and 5 more figures

Theorems & Definitions (47)

  • Definition 1: Error mitigation cost
  • Theorem 1
  • Corollary 1: Exponential estimator concentration
  • Definition 2: Relative resolvability for two points
  • Definition 3: Average relative resolvability across cost landscape
  • Definition 4: Basis-averaged relative resolvability
  • Proposition 1: Relative resolvability of Zero-Noise Extrapolation with global depolarizing noise, 2 noise levels
  • Proposition 2: Average relative resolvability of Zero-Noise Extrapolation, 2 noise levels
  • Proposition 3: Relative resolvability of Virtual Distillation with global depolarizing noise
  • Proposition 4: Average relative resolvability of Virtual Distillation
  • ...and 37 more