TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

Xiwen Chen; Peijie Qiu; Wenhui Zhu; Huayu Li; Hao Wang; Aristeidis Sotiras; Yalin Wang; Abolfazl Razi

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

TL;DR

TimeMIL addresses MTSC by recasting it as weakly supervised MIL, enabling pattern localization in time. It integrates a tokenized transformer with a learnable wavelet positional encoding to model temporal ordering and instance correlations, using Nyström self-attention for scalability. The method achieves state-of-the-art results across 28 datasets, with strong interpretability via attention-based time-point localization. This approach offers a principled, information-theoretic perspective on MTSC and broad potential for applications requiring localized, explainable time-series analysis.

Abstract

Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC. The code will be available at https://github.com/xiwenc1/TimeMIL.

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

TL;DR

Abstract

Paper Structure (27 sections, 4 theorems, 24 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 4 theorems, 24 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Method
Problem Formulation
MTSC as A MIL Problem
Time-Aware MIL Pooling for MTSC
Interpretability
Experiments
Experimental setup and Baselines
Main Experimental Results
Ablation on Model Design Variants
The Effectiveness of Weakly Supervised Learning
Conclusion
UEA Datasets Detail
More detail of Theorem 1
...and 12 more sections

Key Result

Theorem 1

ilse2018attentionshao2021transmil Suppose the score function $S$ is a $(\delta_{\varepsilon},\varepsilon)$-continuous symmetric function w.r.t Hausdorff distance $d_H(\cdot, \cdot)$, i.e. $\forall d_H(\boldsymbol{X}_i, \boldsymbol{X}_j)<\delta_{\varepsilon}$, we have $|S(\boldsymbol{X}_i)-S(\boldsym

Figures (8)

Figure 1: (a): The decision boundary of fully supervised methods is determined by assigning a label to each time series. (b): TimeMIL makes decisions by discriminating positive and negative instances in time series, where each time point is an instance, and its label is typically not available in reality.
Figure 2: The proposed framework of TimeMIL for time series classification with enhanced interpretability: (i) a feature extractor to obtain instance-level feature embeddings, (ii) a MIL pooling to aggregate instance embeddings to a bag-level feature, embedding, and (iii) a bag-level classifier to map bag-level feature to a label prediction. Each time point is treated as an instance and the time series as a bag. Time ordering information and instance correlation are captured by taking the mutual benefit of WPE and MHSA in our TimeMIL pooling (highlighted in green).
Figure 3: The block entropy in Shakespeare's Sonnets with varying shuffling rates, where the higher shuffling rates result in higher block entropy.
Figure 4: The proposed learnable wavelet positional encoding: First, wavelet transform is performed for the input signal (by excluding the class token) with each wavelet basis (Eq. \ref{['eq:wpe_2']}). Second, the signals are aggregated in the wavelet domain by a summation (Eq. \ref{['eq:wpe_1']}). In the case of $n_w=3$, we use 3 learnable wavelet bases ($\boldsymbol{\Psi}_1,\boldsymbol{\Psi}_2,\boldsymbol{\Psi}_3$) to model changing frequency and time scales.
Figure 5: Exemplary attention maps learned in TimeMIL using different datasets (rows) including synthetic dataset, StandWalkJump dataset, and AtrialFibrillation dataset, featuring distinct patterns of interest (columns). TimeMIL accurately localized patterns of interest.
...and 3 more figures

Theorems & Definitions (8)

Theorem 1
Remark 1
Proposition 2
Theorem 3
proof
Remark 2
proof
Theorem 4

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

TL;DR

Abstract

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)