Towards Generalisable Time Series Understanding Across Domains
Özgün Turgut, Philip Müller, Martin J. Menten, Daniel Rueckert
TL;DR
OTiS addresses the challenge of generalising time series understanding across heterogeneous domains by introducing a multi-domain pre-training framework that combines a domain-specific tokeniser, a dual masking strategy, and a normalised cross-correlation loss. Pre-trained on a large, diverse corpus spanning $8$ domains with $640{,}187$ samples and $11{,}052{,}756{,}981$ time points, OTiS achieves competitive performance across classification, regression, and forecasting tasks, and demonstrates zero-shot capabilities and domain-adaptation via learnable domain signatures. Analyses of domain signatures reveal that the model captures inter-variate relationships and temporal patterns consistent with domain-specific structure (e.g., EEG/ECG electrode layouts and climatological relationships), enabling transfer to unseen domains with limited data. Together, these results establish a foundation for general time series analysis with potential impact across medicine, engineering, natural sciences, and finance, while acknowledging limitations related to data curation and the need for even larger pre-training corpora for further gains.
Abstract
Recent breakthroughs in natural language processing and computer vision, driven by efficient pre-training on large datasets, have enabled foundation models to excel on a wide range of tasks. However, this potential has not yet been fully realised in time series analysis, as existing methods fail to address the heterogeneity in large time series corpora. Prevalent in domains ranging from medicine to finance, time series vary substantially in characteristics such as variate count, inter-variate relationships, temporal patterns, and sampling frequency. To address this, we introduce a novel pre-training paradigm specifically designed to handle time series heterogeneity. We propose a tokeniser with learnable domain signatures, a dual masking strategy, and a normalised cross-correlation loss, enabling our open model for general time series analysis (OTiS) to efficiently learn from large time series corpora. Extensive benchmarking on diverse tasks, such as classification, regression, and forecasting, demonstrates that OTiS outperforms state-of-the-art baselines. Our code and pre-trained weights are available at https://github.com/oetu/otis.
