Table of Contents
Fetching ...

Rapid training of quantum recurrent neural networks

Michał Siemaszko, Adam Buraczewski, Bertrand Le Saux, Magdalena Stobińska

TL;DR

The paper tackles the high training cost of recurrent neural networks for time-series tasks by introducing a continuous-variable quantum recurrent neural network (CV-QRNN). It develops a CV quantum information framework and a vanilla RNN-like architecture that uses displacement, squeezing, phase, and beam-splitter gates together with measurement-induced nonlinearity to enable fast learning with a modest parameter count. Numerical experiments show CV-QRNN converges far faster than a classical LSTM while achieving comparable or better losses, and it attains competitive MNIST accuracy with a small parameter budget. The work argues for near-term photonic quantum platforms as a practical route to accelerated RNN training and outlines future steps toward hardware validation and scaling.

Abstract

Time series prediction is essential for human activities in diverse areas. A common approach to this task is to harness Recurrent Neural Networks (RNNs). However, while their predictions are quite accurate, their learning process is complex and, thus, time and energy consuming. Here, we propose to extend the concept of RRNs by including continuous-variable quantum resources in it, and to use a quantum-enhanced RNN to overcome these obstacles. The design of the Continuous-Variable Quantum RNN (CV-QRNN) is rooted in the continuous-variable quantum computing paradigm. By performing extensive numerical simulations, we demonstrate that the quantum network is capable of learning-time dependence of several types of temporal data, and that it converges to the optimal weights in fewer epochs than a classical network. Furthermore, for a small number of trainable parameters, it can achieve lower losses than its classical counterpart. CV-QRNN can be implemented using commercially available quantum-photonic hardware.

Rapid training of quantum recurrent neural networks

TL;DR

The paper tackles the high training cost of recurrent neural networks for time-series tasks by introducing a continuous-variable quantum recurrent neural network (CV-QRNN). It develops a CV quantum information framework and a vanilla RNN-like architecture that uses displacement, squeezing, phase, and beam-splitter gates together with measurement-induced nonlinearity to enable fast learning with a modest parameter count. Numerical experiments show CV-QRNN converges far faster than a classical LSTM while achieving comparable or better losses, and it attains competitive MNIST accuracy with a small parameter budget. The work argues for near-term photonic quantum platforms as a practical route to accelerated RNN training and outlines future steps toward hardware validation and scaling.

Abstract

Time series prediction is essential for human activities in diverse areas. A common approach to this task is to harness Recurrent Neural Networks (RNNs). However, while their predictions are quite accurate, their learning process is complex and, thus, time and energy consuming. Here, we propose to extend the concept of RRNs by including continuous-variable quantum resources in it, and to use a quantum-enhanced RNN to overcome these obstacles. The design of the Continuous-Variable Quantum RNN (CV-QRNN) is rooted in the continuous-variable quantum computing paradigm. By performing extensive numerical simulations, we demonstrate that the quantum network is capable of learning-time dependence of several types of temporal data, and that it converges to the optimal weights in fewer epochs than a classical network. Furthermore, for a small number of trainable parameters, it can achieve lower losses than its classical counterpart. CV-QRNN can be implemented using commercially available quantum-photonic hardware.
Paper Structure (15 sections, 9 equations, 12 figures)

This paper contains 15 sections, 9 equations, 12 figures.

Figures (12)

  • Figure 1: Schema of a Recurrent Neural Network. At every time step $t$, an input vector $\bm{x}_t$ is injected to the network cell (brown square) that is parametrized by a hidden state $\bm{h}_t$. After all the input data have been processed, output sequences $\widetilde{\bm{y}}_\tau$ are produced and they serve as the next input to the RNN (dashed arrows). Parameters of the network (not shown on the figure) are described in the text. Additional sets of rules $\mathcal{R}$ included in the network cells upgrade RNN to LSTM or GRU architectures.
  • Figure 2: CV-QRNN architecture. (a) Single layer $L$ acts on $n = n_1 + n_2$ qumodes (horizontal lines), and consists of displacement gates $D$, squeezing gates $S$, and multiport interferometers $I$. A vector $\bm{x} \in \mathbb{R}^{n_2}$ encodes the input data, while $\bm{\zeta} = \{\bm{\theta_1}, \bm{\varphi_1}, \bm{r_1}, \bm{r_2},\bm{\theta_2}, \bm{\varphi_2}, \bm{\alpha_1},\bm{\alpha_2}, \gamma\}$ denotes all trainable parameters of the network. Red dashed lines split the layer into three parts, responsible for (from left to right): encoding, interaction, and measurement. (b) Data sequence is processed recurrently by iterating layer $L$ over all inputs $\bm{x_1},\ldots,\bm{x}_{T_x}$. All the qumodes are initialized with the vacuum state $\vert 0 \rangle^{\otimes n_{1,2}}$. After each iteration, the output $\widetilde{\bm{x}}'_t$ is measured, mulitplied by parameter $\gamma$, and all bottom wires are reset to the vacuum state. The first prediction of the network $\widetilde{\bm{y}}_0$ is taken only after all data points have been processed. The subsequent prediction $\widetilde{\bm{y}}_\tau$ is the output of the layer $L\left( \widetilde{\bm{y}}_{\tau-1}, \bm{\zeta} \right)$
  • Figure 3: Cost function $C$ (\ref{['eq:cost']}) computed for CV-QRNN (blue line -- training data, light gray -- testing) and LSTM (orange line -- training, dark gray -- testing), as a function of the number of epochs in the task of predicting the values of the Bessel function $J_0(x)$ (Task 1). Shaded regions represent the standard deviation and solid lines are the average for 5 runs of the simulation. The CV-QRNN achieves values of $C$ below $10^{-4}$ already after 10 epochs and reaches $10^{-5}$ below 50 epochs. Such values are accessible for the corresponding LSTM after 150 epochs. The dashed line indicates the cost function for the simplest baseline strategy in which the last input value is repeated as the predicted value.
  • Figure 4: Progress of training on the data generated with Bessel function $J_0(x)$, for CV-QRNN (top row) and LSTM networks (bottom row). Blue points represent the reference data, orange points are predictions based on $T=4$ previous points, and the gray ones -- the forecasted values. Vertical dashed line marks the point where the data was split for training (left) and testing (right) sequences.
  • Figure 5: Cost function $C$ (\ref{['eq:cost']}) after 50 epochs of training CV-QRNN for different lengths of input sequence $T$; Mean values for 5 separate runs are represented by a line, while the shadow depicts the range of achieved values. Training sequences were generated with Bessel function, as described in the text. The choice of $T=4$ in our numerical simulations results from the observation that for larger lengths the gain is not so large while the computing resources and time grows exponentially.
  • ...and 7 more figures