UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification
Yessin Moakher, Youssef Attia El Hili, Vasilii Feofanov
TL;DR
This work adapts DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as the backbone to suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.
Abstract
Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.
