Table of Contents
Fetching ...

Gradients, parallelism, and variance of quantum estimates

Francesco Preti, Michael Schilling, József Zsolt Bernád, Tommaso Calarco, Francisco Cárdenas-López, Felix Motzoi

TL;DR

This work analyzes the sampling complexity of estimating quantum observables and their gradients on near-term hardware, contrasting Standard Estimators (SE) with Linear Combination of Unitaries (LCU) and their amplification variants. It develops a comprehensive LCU gradient framework extending to general n-qubit gates and time-dependent control, including SU($d$) gradients and quantum-control gradients, and provides both numerical and analytical convergence assessments. The study reveals that LCU, especially when combined with amplitude estimation, can offer substantial speedups over SE in certain regimes, while characterizing the conditions under which such gains materialize for gradient-based tasks. Through circuit constructions, gradient formulas, and random-matrix analyses, the authors outline practical pathways for implementing efficient gradient estimation on both near-term and fault-tolerant quantum hardware, with applications to quantum machine learning and quantum control, complemented by open-source code and data.

Abstract

Computation of observables and their gradients on near-term quantum hardware is a central aspect of any quantum algorithm. In this work, we first review standard approaches to the estimation of observables with and without quantum amplitude estimation for both cost functions and gradients, discuss sampling problems, and analyze variance propagation on quantum circuits with and without Linear Combination of Unitaries (LCU). Afterwards, we systematically analyze the standard approaches to gradient computation with LCU circuits. Finally, we develop a LCU gradient framework for the most general gradients based on n-qubit gates and for time-dependent quantum control gradient, analyze the convergence behaviour of the circuit estimators, and provide detailed circuit representations of both for near-term and fault-tolerant hardware.

Gradients, parallelism, and variance of quantum estimates

TL;DR

This work analyzes the sampling complexity of estimating quantum observables and their gradients on near-term hardware, contrasting Standard Estimators (SE) with Linear Combination of Unitaries (LCU) and their amplification variants. It develops a comprehensive LCU gradient framework extending to general n-qubit gates and time-dependent control, including SU() gradients and quantum-control gradients, and provides both numerical and analytical convergence assessments. The study reveals that LCU, especially when combined with amplitude estimation, can offer substantial speedups over SE in certain regimes, while characterizing the conditions under which such gains materialize for gradient-based tasks. Through circuit constructions, gradient formulas, and random-matrix analyses, the authors outline practical pathways for implementing efficient gradient estimation on both near-term and fault-tolerant quantum hardware, with applications to quantum machine learning and quantum control, complemented by open-source code and data.

Abstract

Computation of observables and their gradients on near-term quantum hardware is a central aspect of any quantum algorithm. In this work, we first review standard approaches to the estimation of observables with and without quantum amplitude estimation for both cost functions and gradients, discuss sampling problems, and analyze variance propagation on quantum circuits with and without Linear Combination of Unitaries (LCU). Afterwards, we systematically analyze the standard approaches to gradient computation with LCU circuits. Finally, we develop a LCU gradient framework for the most general gradients based on n-qubit gates and for time-dependent quantum control gradient, analyze the convergence behaviour of the circuit estimators, and provide detailed circuit representations of both for near-term and fault-tolerant hardware.

Paper Structure

This paper contains 39 sections, 3 theorems, 172 equations, 11 figures, 1 table.

Key Result

Theorem 1

The expected value and variance of the observable $\Pi_{\text{LCU}} = \ket{0_c} \bra{0_c} \otimes \mathbb{I}_{L} \otimes \Pi$, where $\Pi$ is an orthogonal projector that describes the measurement operation, are given by with $p_i = \frac{1}{2}\tr{U_i \rho U_i^{\dagger}\Pi}$. If instead the observable $Z_{\text{LCU}} = \ket{0_c} \bra{0_c} \otimes \mathbb{I}_{L} \otimes Z_{\text{prod}}$ is measure

Figures (11)

  • Figure 1: A representation of the two different approaches to observable sampling that are typical of variational quantum circuits: (a) summarizes the Standard Estimator (SE), which prepares $L$ circuits with the same input density matrix and an arbitrary unitary operator $V(\boldsymbol{\theta})$. The unitaries $U_1, ..., U_L$ (which are controlled by a parameter vector $\boldsymbol{\lambda}$) prepare, e.g., the different elements of an observable basis or a collection of $L$ non-commuting operators. (b) summarizes the LCU sampler/estimator, which performs the same kind of estimation, but renormalized between, e.g., $I=(1,-1)$. The coefficients of the linear combination of $L$ estimates are computed using classical methods (a) or loaded in the LCU register with $r=\lceil \log(L) \rceil$ qubits using the operator $W_a$ that prepares the state $\ket{a}$ -- see Eqs. \ref{['eq:W_a']} and \ref{['eq:ket_a']} -- using a suitable algorithm for state preparation Iten_2016da_Silva_2022. The unitary operations $R_1$ and $R_2$ control the type of cost function to estimate: $R_1 = H$, $R_2 = X$ estimates a cost function as in Eq. \ref{['eq:h_v_sigma_z']}, whereas $R_1 = H$ and $R_2 = H$ estimates a cost function as in Eq. \ref{['eq:potq_cost']} -- see also Ref. Somma2002.
  • Figure 2: (a) Circuit implementing the sum of two unitaries $V_1$ and $V_2$ on a quantum computer using one control qubit and (b) circuit implementing the sum of $L$ unitaries using up to $r = \lceil \log(L) \rceil$ qubits (both are based on the circuits given in Ref. Childs2012). Upon measuring the control qubit in either $0$ or $1$, the whole state collapses in a state proportional to either $V_1 + V_2$ or $V_1 - V_2$. The LCU can therefore be used to probabilistically implement arbitrary operators acting on a state $\ket{\psi}$, as those found, e.g., in Hamiltonian simulation. In its generalized implementation (b), the LCU generates all possible combinations using coefficients $\boldsymbol{k}=(k_1,...,k_L)^{\text{T}}$ of sums and differences of $L$ unitaries. The linear combination with only positive terms is mapped to the zero state, however the probability of measuring it decreases with $1/L$Childs2012.
  • Figure 3: Representation of the circuits used to estimate real linear combinations of estimates. (a) Circuit that uses one single control qubit and one LCU registerwith $r =\lceil \log(L) \rceil$ and as a result gives a biased estimator -- see Eq. \ref{['eq:pnlc_sampling']}. (b) Circuit that uses one control qubit and two LCU registers (one for positive and one for the negative terms). Circuit (b) provides us with an unbiased estimator of the real linear combination of estimates -- see Eqs. \ref{['eq:e0unbiased']} and \ref{['eq:e1unbiased']}. Here, the unitaries $U^{+}_1, U^{+}_1, ..., U^{+}_{L^{+}}$ and $U^{-}_{L^+ + 1}, U^{-}_{L^+ + 2}, ..., U^{-}_{L}$ refer to the positive and negative signs of the coefficients, respectively. The control values $1, 2, ..., L^{+}$ and $L^{+}+1, L^{+}+2, ..., L$, are implemented using the gates $W_{\boldsymbol{a}}^\pm$ and binary encoding with a total of $r = r^{+} + r^{-}$ qubits, where $r^{\pm} = \lceil \log(L^{\pm}) \rceil$ but other types of qubit encoding for the multi-controlled gates are also possible.
  • Figure 4: A representation of data encoding and sampling for a QNN qiskit2024 (explicit model). Classical data needs to be loaded on the quantum sampler/estimator. This procedure can be quite expensive due to the input data size and may require some classical pre-processing Jerbi2023qmlbkm, but it can be realized with both LCU and SE approaches. In addition to the two estimators, (near-term) AE routines can be considered Oshio2024Brassard2002Suzuki2020Grinko2021. The cost function is controlled by variational parameters $\boldsymbol{\theta}$ that are optimized classically using gradient-based li2024efficientquantumgradienthigherorderMotzoi2011dalgaard2020hessian or gradient-free Caneva2011NelderMead1965 algorithms. Different types of cost functions, such as the one given in Eq. \ref{['eq:qml_cost']}, can be estimated using the LCU method given in Eq. \ref{['eq:pos_neg_C']} or variants thereof.
  • Figure 5: An example of estimation performed with (simulated) quantum circuits in qiskitqiskit2024 for the regression cost function $C_1(\boldsymbol{\theta})$ in Eq. \ref{['eq:qml_lcu_cost']}, where we use 10000 shots per circuit and from 2 to 100 estimates. The values for datasets I, II and IV are also averaged over 50 different sampling runs. (Top line) Mean values and sampling complexity of the estimator $\tilde{C}_1(\boldsymbol{\theta})$ of $C_1(\boldsymbol{\theta})$ for a random $\boldsymbol{\theta} \sim \mathcal{N}(\boldsymbol{0}_N,\mathbb{I}_N)$ -- see Eq. \ref{['eq:qml_lcu_cost']}. The sampling complexity is defined as the asymptotical total number of queries needed to estimate a quantity up to a fixed precision $\epsilon$. (Bottom line) sampling complexities of estimators according to Eqs. \ref{['eq:var_te']} and \ref{['eq:var_sz_lcu']} for different types of datasets used as inputs to the QNN for both SE (blue line) and the LCU (orange line) estimators. In addition, the maximum possible SE variance is also shown (green line), and, as expected from Eq. \ref{['eq:cauchy_schwarz']}, it always lies below the LCU variance. Shaded regions show the uncertainty for both mean (standard deviation) and variance (here we use an approximate estimate of the fourth moment for SE and LCU, while for the maximum of SE we use the fourth moment of LCU as an upper bound). Column (I) shows the results of sampling auto-correlated quantities as shown in Eq. \ref{['eq:autocorr']}. (II) shows the results for sampling i.i.d. Gaussian variables. (III) shows the results of sampling from the IRIS dataset and (IV) from the MNIST dataset (whose dimensionality is reduced first with a PCA transformation, see also the procedure used in Ref. Jerbi2023qmlbkm). We observe that in all cases the variance grows linearly compared to the variance of the LCU, which is always quadratic. In the auto-correlated case, the particular structure of the data seems to induce a superlinear behaviour, most likely due to the fact that estimates that are functions of the same unitaries (or highly correlated unitaries) are considered, see also the discussion in Appendix \ref{['sec:sampling_PoissonBinomial']}.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Theorem 1: Mean and variance of the LCU estimator
  • proof
  • Theorem 2: LCU vs. Classically Correlated Bernoulli Samples
  • proof
  • Theorem 3
  • proof
  • proof