Table of Contents
Fetching ...

MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen

TL;DR

Time Series Classification often hinges on capturing informative patterns across multiple time scales while preserving temporal localization. The authors introduce MSTAR, a multi-scale backbone search space and NAS framework that jointly optimizes receptive fields and time resolution using a cell-based, InceptionTime-inspired design encoded as a $4 \times 13 \times 13$ adjacency tensor. A convolutional autoencoder (CAE) and neural predictors guide Bayesian optimization to efficiently explore architectures, while a static encoder decodes candidates for evaluation, enabling scalable discovery. Across PTB-XL, EEGEyeNet, Smartphone HAR, and Satellite datasets, MSTAR achieves state-of-the-art performance and demonstrates strong compatibility with Vision Transformer backbones, highlighting the practical impact of time-resolution-aware architecture search for diverse time-series tasks.

Abstract

Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.

MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

TL;DR

Time Series Classification often hinges on capturing informative patterns across multiple time scales while preserving temporal localization. The authors introduce MSTAR, a multi-scale backbone search space and NAS framework that jointly optimizes receptive fields and time resolution using a cell-based, InceptionTime-inspired design encoded as a adjacency tensor. A convolutional autoencoder (CAE) and neural predictors guide Bayesian optimization to efficiently explore architectures, while a static encoder decodes candidates for evaluation, enabling scalable discovery. Across PTB-XL, EEGEyeNet, Smartphone HAR, and Satellite datasets, MSTAR achieves state-of-the-art performance and demonstrates strong compatibility with Vision Transformer backbones, highlighting the practical impact of time-resolution-aware architecture search for diverse time-series tasks.

Abstract

Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.
Paper Structure (33 sections, 8 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: A large kernel, or a complex signal, can be dissected into multiple frequency components. Each component functions as a distinct filter, isolating a specific frequency from the input signals.
  • Figure 2: search space's cell visualization: The default channel sizes along with the preprocessed channel sizes are provided. Edges that connect 2 nodes are denoted as operation-edge (orange) and contain an operation, other edges are used purely for connection (green).
  • Figure 3: search space's encoding visualization: The search space can be represented as a $4\times13\times13$ Adjacency matrix ($A$). This matrix then will be preprocessed to meet the criteria of the search space.
  • Figure 4: The architecture of our convolutional autoencoder $CAE$ and predictors.
  • Figure 5: Spectrograms of the feature maps created by Continuous Wavelet Transform (CWT) from MSTAR's model on PTB-XL ptb-xl. The larger the receptive field, the stronger the frequency intensities. However, this is the trade-off for time resolution.
  • ...and 1 more figures