Table of Contents
Fetching ...

FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction

Varun Teja Chirukiri, Udaya Bhasker Cheerala, Sandeep Kanta, Abdul Karim, Praveen Damacharla

TL;DR

Problem: Predict remaining useful life (RUL) from multivariate sensor data. Approach: a hybrid FTT-GRU combining a Fast Temporal Transformer with a GRU to model global and local temporal dynamics. Key results: on CMAPSS FD001, RMSE = 30.76, MAE = 18.97, and $R^2 = 0.453$, with 1.12 ms latency, outperforming the TCN–Attention baseline. Significance: supports real-time, edge-friendly prognostics with interpretability options and a clear path toward broader generalization via domain adaptation.

Abstract

Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long short-term memory (LSTM) networks and convolutional neural networks (CNNs), often struggle to model both global temporal dependencies and fine-grained degradation trends in multivariate sensor data. We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) -- a lightweight Transformer variant using linearized attention via fast Fourier transform (FFT) -- with a gated recurrent unit (GRU) layer for sequential modeling. To the best of our knowledge, this is the first application of an FTT with a GRU for RUL prediction on NASA CMAPSS, enabling simultaneous capture of global and local degradation patterns in a compact architecture. On CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R^2=0.45$, with 1.12 ms CPU latency at batch=1. Relative to the best published deep baseline (TCN--Attention), it improves RMSE by 1.16\% and MAE by 4.00\%. Training curves averaged over $k=3$ runs show smooth convergence with narrow 95\% confidence bands, and ablations (GRU-only, FTT-only) support the contribution of both components. These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS, making it suitable for real-time industrial prognostics.

FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction

TL;DR

Problem: Predict remaining useful life (RUL) from multivariate sensor data. Approach: a hybrid FTT-GRU combining a Fast Temporal Transformer with a GRU to model global and local temporal dynamics. Key results: on CMAPSS FD001, RMSE = 30.76, MAE = 18.97, and , with 1.12 ms latency, outperforming the TCN–Attention baseline. Significance: supports real-time, edge-friendly prognostics with interpretability options and a clear path toward broader generalization via domain adaptation.

Abstract

Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long short-term memory (LSTM) networks and convolutional neural networks (CNNs), often struggle to model both global temporal dependencies and fine-grained degradation trends in multivariate sensor data. We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) -- a lightweight Transformer variant using linearized attention via fast Fourier transform (FFT) -- with a gated recurrent unit (GRU) layer for sequential modeling. To the best of our knowledge, this is the first application of an FTT with a GRU for RUL prediction on NASA CMAPSS, enabling simultaneous capture of global and local degradation patterns in a compact architecture. On CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and , with 1.12 ms CPU latency at batch=1. Relative to the best published deep baseline (TCN--Attention), it improves RMSE by 1.16\% and MAE by 4.00\%. Training curves averaged over runs show smooth convergence with narrow 95\% confidence bands, and ablations (GRU-only, FTT-only) support the contribution of both components. These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS, making it suitable for real-time industrial prognostics.

Paper Structure

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Proposed FTT-GRU for RUL prediction. Pipeline: input window $[B,30,24] \rightarrow$ positional encoding $\rightarrow$ FTT encoder $[B,30,64]$ (2 layers, 4 heads; Fourier-mixing attention) $\rightarrow$ GRU $[B,30,64]$ (1 layer, 64 units) $\rightarrow$ last hidden $[B,64] \rightarrow$ dense regression head $[B,1]$. The figure groups the encoding, attention, recurrent decoding, and output stages and annotates tensor dimensions.
  • Figure 2: FTT-GRU: training and validation MSE over 10 epochs (mean $\pm$ 95% CI over $k{=}3$ runs).
  • Figure 3: Model comparison on FD001: RMSE and MAE (lower is better). Bars show means; thin error bars (where visible) denote 95% CI across runs.
  • Figure 4: Predicted vs. actual RUL for FTT-GRU with 95% CIs. Points concentrate near the ideal diagonal (red), with wider intervals at high RUL reflecting long-horizon uncertainty.