Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification
Vasilii Feofanov, Songkang Wen, Marius Alonso, Romain Ilbert, Hongbo Guo, Malik Tiomoko, Lujia Pan, Jianfeng Zhang, Ievgen Redko
TL;DR
Mantis introduces a lightweight Vision Transformer–based foundation model for time series classification, pre-trained with a contrastive objective on a large, diverse unlabeled dataset and released as open-source. It features a novel token generator that outputs 32 tokens from raw, differential, and statistical patches, a ViT backbone of 6 layers with a class token, and a flexible projector/prediction head for pre-training and fine-tuning. The authors also propose adapters to compress multivariate channels, enabling efficient inference and fine-tuning on high-channel data. Empirical results show Mantis achieves superior accuracy and calibration compared to state-of-the-art TS foundation models in both zero-shot and fine-tuning regimes, with practical guidance for adapters and calibration techniques in real-world deployments.
Abstract
In recent years, there has been increasing interest in developing foundation models for time series data that can generalize across diverse downstream tasks. While numerous forecasting-oriented foundation models have been introduced, there is a notable scarcity of models tailored for time series classification. To address this gap, we present Mantis, a new open-source foundation model for time series classification based on the Vision Transformer (ViT) architecture that has been pre-trained using a contrastive learning approach. Our experimental results show that Mantis outperforms existing foundation models both when the backbone is frozen and when fine-tuned, while achieving the lowest calibration error. In addition, we propose several adapters to handle the multivariate setting, reducing memory requirements and modeling channel interdependence.
