Table of Contents
Fetching ...

Quantum Temporal Fusion Transformer

Krishnakanta Barik, Goutam Paul

TL;DR

The QTFT paper tackles multi-horizon time series forecasting by extending the classical TFT with a quantum-classical hybrid framework that can run on NISQ devices using variational quantum algorithms. It systematically replaces key TFT submodules with quantum counterparts—via encoding, variational circuits, and quantum attention—while preserving the overall architecture. The study demonstrates that QTFT can achieve lower training and testing losses than the classical TFT across weather and stock datasets, with additional gains when incorporating a quantum LSTM (QLSTM). This work suggests a promising path for leveraging near-term quantum hardware to enhance deep learning models for sequential forecasting tasks, with scalable architecture and broad applicability in domains requiring probabilistic forecasts. $\tau_{\max}$, horizon-aware inputs, and quantile outputs are central to evaluating performance and reliability in real-world decision contexts.

Abstract

The \textit{Temporal Fusion Transformer} (TFT), proposed by Lim \textit{et al.}, published in \textit{International Journal of Forecasting} (2021), is a state-of-the-art attention-based deep neural network architecture specifically designed for multi-horizon time series forecasting. It has demonstrated significant performance improvements over existing benchmarks. In this work, we introduce the Quantum Temporal Fusion Transformer (QTFT), a quantum-enhanced hybrid quantum-classical architecture that extends the capabilities of the classical TFT framework. The core idea of this work is inspired by the foundation studies, \textit{The Power of Quantum Neural Networks} by Amira Abbas \textit{et al.} and \textit{Quantum Vision Transformers} by El Amine Cherrat \textit{et al.}, published in \textit{ Nature Computational Science} (2021) and \textit{Quantum} (2024), respectively. A key advantage of our approach lies in its foundation on a variational quantum algorithm, enabling implementation on current noisy intermediate-scale quantum (NISQ) devices without strict requirements on the number of qubits or circuit depth. Our results demonstrate that QTFT is successfully trained on the forecasting datasets and is capable of accurately predicting future values. In particular, our experimental results on two different datasets display that the model outperforms its classical counterpart in terms of both training and test loss. These results indicate the prospect of using quantum computing to boost deep learning architectures in complex machine learning tasks.

Quantum Temporal Fusion Transformer

TL;DR

The QTFT paper tackles multi-horizon time series forecasting by extending the classical TFT with a quantum-classical hybrid framework that can run on NISQ devices using variational quantum algorithms. It systematically replaces key TFT submodules with quantum counterparts—via encoding, variational circuits, and quantum attention—while preserving the overall architecture. The study demonstrates that QTFT can achieve lower training and testing losses than the classical TFT across weather and stock datasets, with additional gains when incorporating a quantum LSTM (QLSTM). This work suggests a promising path for leveraging near-term quantum hardware to enhance deep learning models for sequential forecasting tasks, with scalable architecture and broad applicability in domains requiring probabilistic forecasts. , horizon-aware inputs, and quantile outputs are central to evaluating performance and reliability in real-world decision contexts.

Abstract

The \textit{Temporal Fusion Transformer} (TFT), proposed by Lim \textit{et al.}, published in \textit{International Journal of Forecasting} (2021), is a state-of-the-art attention-based deep neural network architecture specifically designed for multi-horizon time series forecasting. It has demonstrated significant performance improvements over existing benchmarks. In this work, we introduce the Quantum Temporal Fusion Transformer (QTFT), a quantum-enhanced hybrid quantum-classical architecture that extends the capabilities of the classical TFT framework. The core idea of this work is inspired by the foundation studies, \textit{The Power of Quantum Neural Networks} by Amira Abbas \textit{et al.} and \textit{Quantum Vision Transformers} by El Amine Cherrat \textit{et al.}, published in \textit{ Nature Computational Science} (2021) and \textit{Quantum} (2024), respectively. A key advantage of our approach lies in its foundation on a variational quantum algorithm, enabling implementation on current noisy intermediate-scale quantum (NISQ) devices without strict requirements on the number of qubits or circuit depth. Our results demonstrate that QTFT is successfully trained on the forecasting datasets and is capable of accurately predicting future values. In particular, our experimental results on two different datasets display that the model outperforms its classical counterpart in terms of both training and test loss. These results indicate the prospect of using quantum computing to boost deep learning architectures in complex machine learning tasks.

Paper Structure

This paper contains 29 sections, 51 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Illustration of multi-horizon forecasting. The X-axis represents the time steps (sliding window), while the Y-axis represents the target variables to be predicted. The forecast time point is denoted as $t$. The model uses historical data from $t-k$ to $t$ to predict the selected variable over the future horizon, from $t$ to $t + \tau_{\max}$.
  • Figure 2: Generic architecture for Gated Residual Networks (GRNs). The input $\bm a$ represents the primary input, and $\bm c$ is an optional external context vector. $\mathbf{W}_{1}, \mathbf{W}_{2}$ is a dense layer (neural network) followed by an ELU activation function. $\mathbf{W}_{3}$ is another dense layer without activation function. $\mathbf{W}_{4}, \mathbf{W}_{5}$ represented the Gated Linear Unit (GLU) operation. Final block performance residual connection (add) and layer normalization.
  • Figure 3: TFT architecture. TFT processes three types of inputs: static inputs, time-dependent past inputs, and prior known future inputs. The gated residual network facilitates the flexibility of information either through skip connections or via gated linear unit layers. The variable selection network dynamically identifies the most valuable features from the input data. LSTM layers capture local sequential dependencies, while interpretable multi-head attention enables the combining of information across all time steps.
  • Figure 4: Generic architecture of Variational Quantum Algorithm (VQA). The block $\mathbf{U}(\bm x)$ denotes the data encoding circuit, where $\bm x$ is the input data. This is followed by the parameterized quantum circuits of variational circuit block $\mathbf{V}(\bm \theta)$, which consists of trainable parameters $\bm \theta$. After, a quantum measurement operation is performed on all qubits. Finally, the cost function $\mathbf{C}(\bm x, \bm \theta)$ is evaluated.
  • Figure 5: (a) Angle Embedding. The feature vector is $\bm v = (v_1, v_2, v_3)$, encoded into 3 qubits. Rotation gates $\mathbf R_z$ are applied to encode the features; if not specified, $\mathbf R_x$ rotations are used by default. (b) ZZ Feature Map. The feature vector is $\bm v = (v_1, v_2, v_3)$, encoded into three qubits and one repetition Layer. $\mathbf{P}_i$ = $\mathbf{P}(2 * \psi(v_i))$ and $\mathbf{P}_{i,j}$ = $\mathbf{P}(2 * \psi(v_i, v_j))$, where $\mathbf P$ denotes the Phase Gate $\mathbf P(\lambda) = 100e^{i\lambda}$ and $\psi$ is a non-linear function, which defaults to $\psi(x) = x$, and $\psi(x, y) = (\pi - x)(\pi - y)$.
  • ...and 9 more figures