WSBD: Freezing-Based Optimizer for Quantum Neural Networks

Christopher Kverne; Mayur Akewar; Yuqian Huo; Tirthak Patel; Janki Bhimani

WSBD: Freezing-Based Optimizer for Quantum Neural Networks

Christopher Kverne, Mayur Akewar, Yuqian Huo, Tirthak Patel, Janki Bhimani

TL;DR

This work tackles the high cost of gradient estimation and barren plateaus in quantum neural network (QNN) training by introducing Weighted Stochastic Block Descent (WSBD), a dynamic, parameter-wise freezing optimizer. WSBD computes a gradient-based importance score, then stochastically freezes less influential parameters in training windows to reduce forward passes while preserving full expressivity; scores are reset when parameters re-enter the active set. The authors provide a formal convergence proof and demonstrate substantial, scalable efficiency gains across MNIST, parity, and VQE tasks, with robustness to hardware noise. Ablation studies highlight the importance of stochastic freezing, granular parameter-wise decisions, and adaptive score resets. The approach yields practical speedups and identifies a principled direction for hardware-aware optimization in QML.

Abstract

The training of Quantum Neural Networks (QNNs) is hindered by the high computational cost of gradient estimation and the barren plateau problem, where optimization landscapes become intractably flat. To address these challenges, we introduce Weighted Stochastic Block Descent (WSBD), a novel optimizer with a dynamic, parameter-wise freezing strategy. WSBD intelligently focuses computational resources by identifying and temporarily freezing less influential parameters based on a gradient-derived importance score. This approach significantly reduces the number of forward passes required per training step and helps navigate the optimization landscape more effectively. Unlike pruning or layer-wise freezing, WSBD maintains full expressive capacity while adapting throughout training. Our extensive evaluation shows that WSBD converges on average 63.9% faster than Adam for the popular ground-state-energy problem, an advantage that grows with QNN size. We provide a formal convergence proof for WSBD and show that parameter-wise freezing outperforms traditional layer-wise approaches in QNNs. Project page: https://github.com/Damrl-lab/WSBD-Stochastic-Freezing-Optimizer.

WSBD: Freezing-Based Optimizer for Quantum Neural Networks

TL;DR

Abstract

Paper Structure (40 sections, 1 theorem, 55 equations, 6 figures, 5 tables)

This paper contains 40 sections, 1 theorem, 55 equations, 6 figures, 5 tables.

INTRODUCTION
BACKGROUND AND RELATED WORK
QNNs and Key Training Obstacles
Strategies for Accelerating QNN Training
WEIGHTED STOCHASTIC BLOCK DESCENT
THEORETICAL CONVERGENCE FRAMEWORK
EXPERIMENTAL SETUP
Tasks and Datasets
Experimental Design
Optimizers Considered
EVALUATION AND RESULTS
Scalable Efficiency and Practical Savings in QNN Training
Robustness Under Noise
Why Each Component Matters: Ablation Study
CONCLUSION
...and 25 more sections

Key Result

Theorem 1

Let the objective function be denoted by where $M$ is a bounded Hermitian observable, and each $G_j$ is a bounded Hermitian generator. Then the gradient $\nabla f(\boldsymbol{\theta})$ is globally Lipschitz continuous, i.e., there exists a constant $L<\infty$ such that

Figures (6)

Figure 1: VQA architecture used in this study with data encoding, variational layers, and measurement. The circuit begins with an encoding block $A(\hat{x_i})$ that maps classical data to quantum states, followed by alternating parametrized rotations and entangling CNOT gates. The final measurement observable $\bra{\psi} M \ket{\psi}$ extracts the computational result.
Figure 2: Training curves using Adam, SGD and their WSBD counterpart optimizers on the VQE problem. The black dotted line represents the ground state energy each optimizer aims to reach (closer is better). Each optimizer was trained until convergence. For the full training curves of all QNN sizes see Figure \ref{['fig:vqe_all']}. Table \ref{['tab:vqe_percent_reduction']} shows the percentage reduction in forward passes needed to converge using both WSBD SGD and WSBD Adam.
Figure 3: Hyperparameter tuning for WSBD. Each sub figure shows how the choice in importance score, freezing threshold and training window affects the optimization process. This was done for a 10 qubit, 5 layer QNN on the MNIST problem.
Figure 4: Training curves for the ground state energy problem using Adam, SGD and their WSBD counterpart optimizers. The black dotted line represents the ground state energy each optimizer aims to reach (closer is better). The models were trained in realistic noisy environments until convergence was reached.
Figure 5: Comparisons of QNN training on optimizers. Top row: MNIST classification problem. Bottom row: Parity problem. We compare all optimizers summarized in Table \ref{['tab:optimizers']} on these two tasks. WSBD shows clear performance improvements having both a faster decrease in loss and often reaching a lower loss overall for many models.
...and 1 more figures

Theorems & Definitions (6)

proof
Theorem 1: Smoothness of QNN objectives
proof
proof
proof
proof

WSBD: Freezing-Based Optimizer for Quantum Neural Networks

TL;DR

Abstract

WSBD: Freezing-Based Optimizer for Quantum Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)