Table of Contents
Fetching ...

Stochastic Shadow Descent: Training Parametrized Quantum Circuits with Shadows of Gradients

Sayantan Pramanik, M Girish Chandra

TL;DR

This work tackles training Parametrized Quantum Circuits (PQCs) by addressing the bias and scaling issues of standard gradient methods. It introduces Stochastic Shadow Descent (SSD), which uses random projection directions and unbiased directional derivatives computed via specialized quantum circuits (Inner Product Circuits) to update parameters, removing reliance on finite-difference gradients. The authors prove convergence of SSD to an $\varepsilon$-stationary point with an $O(Ld/\varepsilon^4)$ circuit-budget bound and validate the approach on a MNIST-based quantum classifier, achieving SGD-level performance with ~100× fewer circuit executions. Overall, the paper presents a practical, theoretically grounded pathway to scalable, quantum-aware optimization for variational quantum algorithms.

Abstract

In this paper, we focus on the task of optimizing the parameters in Parametrized Quantum Circuits (PQCs). While popular algorithms, such as Simultaneous Perturbation Stochastic Approximation (SPSA), limit the number of circuit-execution to two per iteration, irrespective of the number of parameters in the circuit, they have their own challenges. These methods use central-differences to calculate biased estimates of directional derivatives. We show, both theoretically and numerically, that this may lead to instabilities in \emph{training} the PQCs. To remedy this, we propose Stochastic Shadow Descent (\texttt{SSD}), which uses random-projections (or \emph{shadows}) of the gradient to update the parameters iteratively. We eliminate the bias in directional derivatives by employing the Parameter-Shift Rule, along with techniques from Quantum Signal Processing, to construct a quantum circuit that parsimoniously computes \emph{unbiased estimates} of directional derivatives. Finally, we prove the convergence of the \texttt{SSD} algorithm, provide worst-case bounds on the number of iterations, and numerically demonstrate its efficacy.

Stochastic Shadow Descent: Training Parametrized Quantum Circuits with Shadows of Gradients

TL;DR

This work tackles training Parametrized Quantum Circuits (PQCs) by addressing the bias and scaling issues of standard gradient methods. It introduces Stochastic Shadow Descent (SSD), which uses random projection directions and unbiased directional derivatives computed via specialized quantum circuits (Inner Product Circuits) to update parameters, removing reliance on finite-difference gradients. The authors prove convergence of SSD to an -stationary point with an circuit-budget bound and validate the approach on a MNIST-based quantum classifier, achieving SGD-level performance with ~100× fewer circuit executions. Overall, the paper presents a practical, theoretically grounded pathway to scalable, quantum-aware optimization for variational quantum algorithms.

Abstract

In this paper, we focus on the task of optimizing the parameters in Parametrized Quantum Circuits (PQCs). While popular algorithms, such as Simultaneous Perturbation Stochastic Approximation (SPSA), limit the number of circuit-execution to two per iteration, irrespective of the number of parameters in the circuit, they have their own challenges. These methods use central-differences to calculate biased estimates of directional derivatives. We show, both theoretically and numerically, that this may lead to instabilities in \emph{training} the PQCs. To remedy this, we propose Stochastic Shadow Descent (\texttt{SSD}), which uses random-projections (or \emph{shadows}) of the gradient to update the parameters iteratively. We eliminate the bias in directional derivatives by employing the Parameter-Shift Rule, along with techniques from Quantum Signal Processing, to construct a quantum circuit that parsimoniously computes \emph{unbiased estimates} of directional derivatives. Finally, we prove the convergence of the \texttt{SSD} algorithm, provide worst-case bounds on the number of iterations, and numerically demonstrate its efficacy.

Paper Structure

This paper contains 9 sections, 2 theorems, 3 equations, 4 figures, 2 algorithms.

Key Result

Proposition 1

If Assumption as:smooth is true, then $\forall\; \boldsymbol{\theta}, \mathbf{v} \in \mathbb{R}^d$: $\underset{\mu \rightarrow 0}{\lim} \, \frac{1}{\mu}\left(f(\boldsymbol{\theta}+\mu\mathbf{v})-f(\boldsymbol{\theta})\right) = \braket{\nabla f(\boldsymbol{\theta}), \mathbf{v}}$. Further, if $\mathbf

Figures (4)

  • Figure 1: Plots depicting the training loss incurred while training a simple quantum model on the Iris dataset with RSGF and SPSA methods, respectively, for various values of $\mu$.
  • Figure 2: The IPC (constructed using Algorithm \ref{['alg:ipc']}) corresponding to a single variational layer of the BasicEntanglerLayers ansatz with 4 qubits and 4 parameters. The circuit returns $D^s_\mathbf{v} (\boldsymbol{\theta})$ as an output.
  • Figure 3: An enhanced version of the IPC depicted in Fig. \ref{['fig:dd2']}, which can be executed just once to compute $D_\mathbf{v}(\boldsymbol{\theta})$. The output of this circuit is $\left(D^{+}_\mathbf{v} (\boldsymbol{\theta})-D^{-}_\mathbf{v} (\boldsymbol{\theta})\right)/2$. Here, $\mathbf{Z}$ is the Pauli-Z matrix.
  • Figure 4: Plot of training loss against iterations, and the number of circuit-executions for SGD, RSGF, SPSA, and SSD.

Theorems & Definitions (3)

  • Definition 1
  • Proposition 1
  • Theorem 1