Table of Contents
Fetching ...

UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification

Yessin Moakher, Youssef Attia El Hili, Vasilii Feofanov

TL;DR

This work adapts DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as the backbone to suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.

Abstract

Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.

UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification

TL;DR

This work adapts DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as the backbone to suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.

Abstract

Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.
Paper Structure (32 sections, 14 equations, 8 figures, 6 tables)

This paper contains 32 sections, 14 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Example of a time series sample generation via causal DAG.
  • Figure 2: Architecture.
  • Figure 3: Unlabeled Time-series Crop Augmented framework. The self-supervised objective aims to match the features produced by the teacher with those produced by the student. Full details about pretraining can be found in Appendix \ref{['app:pretraining']}.
  • Figure 4: Average accuracy comparison across linear probing and fine-tuning on both UCR and UEA benchmarks. Number of wins shown in parentheses.
  • Figure 5: Example of synthetic time series from the DAG-based generator.
  • ...and 3 more figures