Table of Contents
Fetching ...

Federated Foundation Models on Heterogeneous Time Series

Shengchao Chen, Guodong Long, Jing Jiang, Chengqi Zhang

TL;DR

The paper tackles the challenge of training generalizable Time Series Foundation Models across heterogeneous domains by introducing FFTS, a federated learning framework that preserves domain-specific patterns while sharing knowledge. It combines an encoder-only Transformer with Patch Embedding and an Adaptive Trend-awareness Module to capture cross-timescale temporal patterns, reinforced by a unified masking strategy and a heterogeneous knowledge alignment regularizer. Across federated pretraining and multiple downstream tasks (forecasting, imputation, anomaly detection), FFTS demonstrates superior cross-domain generalization, strong few-/zero-shot performance, and competitive or superior results compared with centralized training and state-of-the-art baselines. The work also analyzes efficiency and privacy implications, arguing that FFTS enables scalable, privacy-preserving TSFM pretraining in real-world, multi-institution settings. Overall, FFTS offers a practical path to robust, cross-domain TSFM generalization without centralized data fusion.

Abstract

Training a general-purpose time series foundation models with robust generalization capabilities across diverse applications from scratch is still an open challenge. Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture. However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn't work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. Specifically, each data-holding organization is treated as an independent client in a collaborative learning framework with federated settings, and then many client-specific local models will be trained to preserve the unique characteristics per dataset. Moreover, a new regularization mechanism will be applied to both client-side and server-side, thus to align the shared knowledge across heterogeneous datasets from different domains. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed federated learning approach. The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.

Federated Foundation Models on Heterogeneous Time Series

TL;DR

The paper tackles the challenge of training generalizable Time Series Foundation Models across heterogeneous domains by introducing FFTS, a federated learning framework that preserves domain-specific patterns while sharing knowledge. It combines an encoder-only Transformer with Patch Embedding and an Adaptive Trend-awareness Module to capture cross-timescale temporal patterns, reinforced by a unified masking strategy and a heterogeneous knowledge alignment regularizer. Across federated pretraining and multiple downstream tasks (forecasting, imputation, anomaly detection), FFTS demonstrates superior cross-domain generalization, strong few-/zero-shot performance, and competitive or superior results compared with centralized training and state-of-the-art baselines. The work also analyzes efficiency and privacy implications, arguing that FFTS enables scalable, privacy-preserving TSFM pretraining in real-world, multi-institution settings. Overall, FFTS offers a practical path to robust, cross-domain TSFM generalization without centralized data fusion.

Abstract

Training a general-purpose time series foundation models with robust generalization capabilities across diverse applications from scratch is still an open challenge. Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture. However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn't work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. Specifically, each data-holding organization is treated as an independent client in a collaborative learning framework with federated settings, and then many client-specific local models will be trained to preserve the unique characteristics per dataset. Moreover, a new regularization mechanism will be applied to both client-side and server-side, thus to align the shared knowledge across heterogeneous datasets from different domains. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed federated learning approach. The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.

Paper Structure

This paper contains 35 sections, 5 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Examples of statistical heterogeneity across time series datasets. HeartRate: healthcare data, Precipitation: weather data.
  • Figure 2: Overview of Time Series Foundation Models (TSFMs) that training from scratch. (a) STS Pretraing: Training from scratch using a single time series (STS) obtained by fusing data from different domains goswami2024momentliu2024timerdas2023decoder. (b) MTS pretraing: Training from scratch using multiple time series (MTS) from different domains woo2024unifiedtraininguniversaltime. (c) Federated Pretraining (Ours, this paper): Instead of merging time series from various domains, separate local models are trained for each source. These models are then aggregated into a global model on the server to form a TSFM.
  • Figure 3: Architecture of model within FFTS. a Structure of local model in each client. b Architecture of the proposed Adaptive Trend-awareness Module (ATM), which consists of four independent experts for extracting trends at different timescales based on the representation from Attention. Structurally inspired by the Mixture of Experts (MoE) fedus2022switch. c Architecture of the Gating Network.
  • Figure 4: Visualization of cross-domain trend similarity within historical observations. Upper: Weather (1-hour resolution), Energy (5-minute resolution), Network (30-second resolution), Natural (1-day resolution), Bottom: Corresponding trend.
  • Figure 5: Schematic diagram of FFTS for downstream adaption. A unified adaptation head facilitates knowledge transfer across different downstream tasks. (a) Predicting future trends from past data. (b) Filling gaps in data using related time series and context. (c) Identifying unusual patterns in time series.
  • ...and 1 more figures