TimesBERT: A BERT-Style Foundation Model for Time Series Understanding
Haoran Zhang, Yong Liu, Yunzhong Qiu, Haixuan Liu, Zhongyi Pei, Jianmin Wang, Mingsheng Long
TL;DR
TimesBERT introduces a BERT-style encoder for time series understanding, treating multivariate time series as multisentence documents and repurposing functional tokens for multi-granularity reasoning. It pre-trains on a large corpus of $260$ billion time points with two objectives, Masked Patch Modeling and Functional Token Prediction, using a unified time-series embedding and an encoder with $L=12$, $H=768$, $A=12$ and a $512$-token context. The model achieves state-of-the-art across four understanding tasks—classification, imputation, anomaly detection, and short-term forecasting—across hundreds of real-world datasets, demonstrating strong transferability and cross-domain robustness. Ablation studies show the value of the FTP task and multivariate modeling, and the results highlight the importance of time-series native pre-training over cross-modal initialization. This work positions TimesBERT as a versatile foundation model for time series understanding with practical implications for cross-domain analytics.
Abstract
Time series analysis is crucial in diverse scenarios. Beyond forecasting, considerable real-world tasks are categorized into classification, imputation, and anomaly detection, underscoring different capabilities termed time series understanding in this paper. While GPT-style models have been positioned as foundation models for time series forecasting, the BERT-style architecture, which has made significant advances in natural language understanding, has not been fully unlocked for time series understanding, possibly attributed to the undesirable dropout of essential elements of BERT. In this paper, inspired by the shared multi-granularity structure between multivariate time series and multisentence documents, we design TimesBERT to learn generic representations of time series including temporal patterns and variate-centric characteristics. In addition to a natural adaptation of masked modeling, we propose a parallel task of functional token prediction to embody vital multi-granularity structures. Our model is pre-trained on 260 billion time points across diverse domains. Leveraging multi-granularity representations, TimesBERT achieves state-of-the-art performance across four typical downstream understanding tasks, outperforming task-specific models and language pre-trained backbones, positioning it as a versatile foundation model for time series understanding.
