TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare
Ziyang Song, Qincheng Lu, Hao Xu, He Zhu, David L. Buckeridge, Yue Li
TL;DR
TimelyGPT tackles the challenge of long-term forecasting in healthcare time-series by extending Transformer-based pre-training with an extrapolatable position embedding ($xPos$), Retention-based global attention, and local temporal convolutions. The model supports efficient linear training and constant-time inference while enabling extrapolation beyond training horizons, addressing the limitations of conventional self-attention for long sequences. It is pre-trained on unlabeled large-scale biosignal and EHR-like data and fine-tuned for downstream tasks, achieving strong extrapolation up to 6,000 timesteps and high recall for irregularly-sampled diagnoses. The approach demonstrates a scalable, transferable framework for long-term patient health state forecasting and risk trajectory modeling in healthcare domains. Potential impact includes improved long-range monitoring and earlier intervention through robust, data-efficient pre-training on diverse healthcare time-series.
Abstract
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing and Computer Vision domains. However, the development of PTMs on healthcare time-series data is lagging behind.This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale time series and ability to capture long-term temporal dependencies. In this study, we present Timely Generative Pre-trained Transformer (TimelyGPT). TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies. We evaluated TimelyGPT on two large-scale healthcare time series datasets corresponding to continuous biosignals and irregularly-sampled time series, respectively. Our experiments show that during pre-training, TimelyGPT excels in learning time-series representations from continuously monitored biosignals and irregularly-sampled time series data commonly observed in longitudinal electronic health records (EHRs). In forecasting continuous biosignals, TimelyGPT achieves accurate extrapolation up to 6,000 timesteps of body temperature during the sleep stage transition, given a short look-up window (i.e., prompt) containing only 2,000 timesteps. For irregularly-sampled time series, TimelyGPT with a proposed time-specific inference demonstrates high top recall scores in predicting future diagnoses using early diagnostic records, effectively handling irregular intervals between clinical records. Together, we envision TimelyGPT to be useful in a broad spectrum of health domains, including long-term patient health state forecasting and patient risk trajectory prediction.
