An Empirical Comparison of Optimizers for Quantum Machine Learning with SPSA-based Gradients
Marco Wiedmann, Marc Hölle, Maniraman Periyasamy, Nico Meyer, Christian Ufrecht, Daniel D. Scherer, Axel Plinge, Christopher Mutschler
TL;DR
This paper addresses the challenge of efficiently training variational quantum circuits (VQCs) on NISQ hardware by comparing optimizers using SPSA-based gradients versus the traditional parameter-shift gradients. The authors propose a hybrid approach that feeds an SPSA-derived gradient into standard gradient-based optimizers (SGD, Adam, AMSGrad, RMSProp) and evaluate across multiple regression datasets under ideal, shot-noise, and hardware-noise with and without zero-noise extrapolation error mitigation. Key findings show that SPSA-based gradients, particularly when coupled with AMSGrad, converge faster and yield smaller final errors than parameter-shift gradients, and that this advantage persists under realistic noise conditions, though error mitigation can sometimes increase variance for certain observables. The work provides practical guidance for optimizing VQCs on NISQ devices, highlighting that SPSA-based gradient estimates integrate well with modern optimizers to reduce training time and improve robustness to noise.
Abstract
VQA have attracted a lot of attention from the quantum computing community for the last few years. Their hybrid quantum-classical nature with relatively shallow quantum circuits makes them a promising platform for demonstrating the capabilities of NISQ devices. Although the classical machine learning community focuses on gradient-based parameter optimization, finding near-exact gradients for VQC with the parameter-shift rule introduces a large sampling overhead. Therefore, gradient-free optimizers have gained popularity in quantum machine learning circles. Among the most promising candidates is the SPSA algorithm, due to its low computational cost and inherent noise resilience. We introduce a novel approach that uses the approximated gradient from SPSA in combination with state-of-the-art gradient-based classical optimizers. We demonstrate numerically that this outperforms both standard SPSA and the parameter-shift rule in terms of convergence rate and absolute error in simple regression tasks. The improvement of our novel approach over SPSA with stochastic gradient decent is even amplified when shot- and hardware-noise are taken into account. We also demonstrate that error mitigation does not significantly affect our results.
