Stochastic Shadow Descent: Training Parametrized Quantum Circuits with Shadows of Gradients
Sayantan Pramanik, M Girish Chandra
TL;DR
This work tackles training Parametrized Quantum Circuits (PQCs) by addressing the bias and scaling issues of standard gradient methods. It introduces Stochastic Shadow Descent (SSD), which uses random projection directions and unbiased directional derivatives computed via specialized quantum circuits (Inner Product Circuits) to update parameters, removing reliance on finite-difference gradients. The authors prove convergence of SSD to an $\varepsilon$-stationary point with an $O(Ld/\varepsilon^4)$ circuit-budget bound and validate the approach on a MNIST-based quantum classifier, achieving SGD-level performance with ~100× fewer circuit executions. Overall, the paper presents a practical, theoretically grounded pathway to scalable, quantum-aware optimization for variational quantum algorithms.
Abstract
In this paper, we focus on the task of optimizing the parameters in Parametrized Quantum Circuits (PQCs). While popular algorithms, such as Simultaneous Perturbation Stochastic Approximation (SPSA), limit the number of circuit-execution to two per iteration, irrespective of the number of parameters in the circuit, they have their own challenges. These methods use central-differences to calculate biased estimates of directional derivatives. We show, both theoretically and numerically, that this may lead to instabilities in \emph{training} the PQCs. To remedy this, we propose Stochastic Shadow Descent (\texttt{SSD}), which uses random-projections (or \emph{shadows}) of the gradient to update the parameters iteratively. We eliminate the bias in directional derivatives by employing the Parameter-Shift Rule, along with techniques from Quantum Signal Processing, to construct a quantum circuit that parsimoniously computes \emph{unbiased estimates} of directional derivatives. Finally, we prove the convergence of the \texttt{SSD} algorithm, provide worst-case bounds on the number of iterations, and numerically demonstrate its efficacy.
