Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions
Utsav Dutta, Sina Khoshfetrat Pakazad, Henrik Ohlsson
TL;DR
The paper tackles the lack of generalizable foundation models for time series by introducing CHARM, a channel-aware foundation embedding model for multivariate time series. CHARM combines a description conditioned contextual temporal convolutional network with a description guided contextual attention layer and is trained with a JEPA style self-supervised objective that operates in embedding space. The approach yields state-of-the-art representations for classification, anomaly detection, and forecasting across diverse datasets, while providing interpretable dynamics through learned channel gates and heatmaps. This work demonstrates the practical potential of foundation models for time series and opens avenues for richer multimodal integration and cross-domain transfer.
Abstract
Traditional time series models are task-specific and often depend on dataset-specific training and extensive feature engineering. While Transformer-based architectures have improved scalability, foundation models, commonplace in text, vision, and audio, remain under-explored for time series and are largely restricted to forecasting. We introduce $\textbf{CHARM}$, a foundation embedding model for multivariate time series that learns shared, transferable, and domain-aware representations. To address the unique difficulties of time series foundation learning, $\textbf{CHARM}$ incorporates architectural innovations that integrate channel-level textual descriptions while remaining invariant to channel order. The model is trained using a Joint Embedding Predictive Architecture (JEPA), with novel augmentation schemes and a loss function designed to improve interpretability and training stability. Our $7$M-parameter model achieves state-of-the-art performance across diverse downstream tasks, setting a new benchmark for time series representation learning.
