Table of Contents
Fetching ...

Training Hybrid Deep Quantum Neural Network for Efficient Reinforcement Learning

Jie Luo, Jeremy Kulcsar, Xueyin Chen, Giulio Giaconi, Georgios Korpas

TL;DR

This work tackles the gradient-backpropagation bottleneck in training hybrid deep quantum neural networks for reinforcement learning by introducing qtDNN, a tangential differentiable surrogate that locally approximates a parameterised quantum circuit (PQC) within each minibatch. By embedding qtDNN into the computation graph, the method enables scalable batched gradients while preserving quantum inference, and provides a local gradient-fidelity guarantee that perturbations to upstream gradients are bounded. Building on this surrogate, the authors propose hDQNN-TD3, a hybrid actor-critic agent for continuous-control tasks, achieving state-of-the-art-like performance on Humanoid-v4 and competitive results on Hopper-v4 with ablations confirming the quantum layer's contribution. The approach reduces quantum-resource requirements by avoiding repeated circuit evaluations during backpropagation, offering a practical path to applying hybrid quantum models to large-scale RL and other gradient-intensive tasks in noisy intermediate-scale quantum (NISQ) settings. Overall, qtDNN enables efficient, gradient-based training of hybrid quantum RL models, potentially accelerating the adoption of quantum-enhanced learning in complex robotics and AI systems.

Abstract

Quantum circuits embed data in a Hilbert space whose dimensionality grows exponentially with the number of qubits, allowing even shallow parameterised quantum circuits (PQCs) to represent highly-correlated probability distributions that are costly for classical networks to capture. Reinforcement-learning (RL) agents, which must reason over long-horizon, continuous-control tasks, stand to benefit from this expressive quantum feature space, but only if the quantum layers can be trained jointly with the surrounding deep-neural components. Current gradient-estimation techniques (e.g., parameter-shift rule) make such hybrid training impractical for realistic RL workloads, because every gradient step requires a prohibitive number of circuit evaluations and thus erodes the potential quantum advantage. We introduce qtDNN, a tangential surrogate that locally approximates a PQC with a small differentiable network trained on-the-fly from the same minibatch. Embedding qtDNN inside the computation graph yields scalable batch gradients while keeping the original quantum layer for inference. Building on qtDNN we design hDQNN-TD3, a hybrid deep quantum neural network for continuous-control reinforcement learning based on the TD3 architecture, which matches or exceeds state-of-the-art classical performance on popular benchmarks. The method opens a path toward applying hybrid quantum models to large-scale RL and other gradient-intensive machine-learning tasks.

Training Hybrid Deep Quantum Neural Network for Efficient Reinforcement Learning

TL;DR

This work tackles the gradient-backpropagation bottleneck in training hybrid deep quantum neural networks for reinforcement learning by introducing qtDNN, a tangential differentiable surrogate that locally approximates a parameterised quantum circuit (PQC) within each minibatch. By embedding qtDNN into the computation graph, the method enables scalable batched gradients while preserving quantum inference, and provides a local gradient-fidelity guarantee that perturbations to upstream gradients are bounded. Building on this surrogate, the authors propose hDQNN-TD3, a hybrid actor-critic agent for continuous-control tasks, achieving state-of-the-art-like performance on Humanoid-v4 and competitive results on Hopper-v4 with ablations confirming the quantum layer's contribution. The approach reduces quantum-resource requirements by avoiding repeated circuit evaluations during backpropagation, offering a practical path to applying hybrid quantum models to large-scale RL and other gradient-intensive tasks in noisy intermediate-scale quantum (NISQ) settings. Overall, qtDNN enables efficient, gradient-based training of hybrid quantum RL models, potentially accelerating the adoption of quantum-enhanced learning in complex robotics and AI systems.

Abstract

Quantum circuits embed data in a Hilbert space whose dimensionality grows exponentially with the number of qubits, allowing even shallow parameterised quantum circuits (PQCs) to represent highly-correlated probability distributions that are costly for classical networks to capture. Reinforcement-learning (RL) agents, which must reason over long-horizon, continuous-control tasks, stand to benefit from this expressive quantum feature space, but only if the quantum layers can be trained jointly with the surrounding deep-neural components. Current gradient-estimation techniques (e.g., parameter-shift rule) make such hybrid training impractical for realistic RL workloads, because every gradient step requires a prohibitive number of circuit evaluations and thus erodes the potential quantum advantage. We introduce qtDNN, a tangential surrogate that locally approximates a PQC with a small differentiable network trained on-the-fly from the same minibatch. Embedding qtDNN inside the computation graph yields scalable batch gradients while keeping the original quantum layer for inference. Building on qtDNN we design hDQNN-TD3, a hybrid deep quantum neural network for continuous-control reinforcement learning based on the TD3 architecture, which matches or exceeds state-of-the-art classical performance on popular benchmarks. The method opens a path toward applying hybrid quantum models to large-scale RL and other gradient-intensive machine-learning tasks.

Paper Structure

This paper contains 80 sections, 1 theorem, 25 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $Q:\mathbb R^d\to[0,1]^N$ be the deterministic map that returns the per‑qubit $S$‑shot marginal vector of the PQC, and suppose $Q\in C^1$ on an open set $\mathcal{U}\subset\mathbb R^d$. Fix a mini-batch $\mathcal{B}=\{x_i\}_{i=1}^{N_b}\subset\mathcal{U}$ lying in a closed ball of radius $r$ cent Consequently, if the PQC is replaced by $Q_\theta$ inside the computation graph during back-propaga

Figures (7)

  • Figure 1: a, a typical hDQNN contains a PreDNN connected to a PostDNN via both $d_\text{c-link}$ direct connections and a quantum layer realised as a parametrised quantum circuit. The hDQNN inferences through the training dataset and records $(\mathbf{x},\mathbf{y}_\text{pred},\mathbf{y},\mathbf{q}_\text{i},\mathbf{q}_\text{o})$ of each forward pass into a buffer $\mathcal{M}$. In an update step, a batch, $\mathcal{B}$,of $N_\text{b}$ entries is sampled from $\mathcal{M}$ for updating hDQNN parameters. b, $\{(\mathbf{q}_\text{i},\mathbf{q}_\text{o})\}$ from $\mathcal{M}$ is stored into a qtDNN Replay Buffer. $N_\text{qt}$ tiny-batches, $\{ \mathcal{B}_\text{qt}\}$, are sampled from $\mathcal{B}$ and then used to update qtDNN towards approximating PQC in $\mathcal{B}$. c, the updated qtDNN is then used in the surrogate model $Q_\text{qt}$ in this update step to facilitate the batched back-propagation of loss that are used to update the PreDNN and PostDNN with fixed qtDNN.
  • Figure 2: a, the typical exploration flow for the Agent to interact with the reinforcement learning objective's target environment, $\mathcal{E}$, and obtain experience data to be stored in the replay buffer, $\mathcal{M}$. b, the particular modular model architecture used for the Actor Model, $A$, that allows the intermediate mapping $T$ that maps its input vector $\mathbf{q}_\text{i}$ to $\mathbf{q}_\text{o}$ to be implemented differently without changing the rest of the model design. This allows comparing learning performance of different quantum and classical implementations fairly.
  • Figure 3: hDQNN-TD3 on Humanoid-v4; 4 seeds; 10‑episode rolling average; 95% CI.
  • Figure 4: Best seed return; equal step budget; horizontal lines are public PPO/SAC/TD3 references.
  • Figure 5: hDQNN-TD3 on Hopper-v4; 4 seeds; 10‑episode rolling average; 95% CI.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1: Local gradient fidelity of qtDNN
  • proof