Table of Contents
Fetching ...

MONSTER: Monash Scalable Time Series Evaluation Repository

Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb

TL;DR

MONSTER introduces a large-scale, multi-domain benchmark for time series classification to address the skew toward small datasets in existing benchmarks. By aggregating 29 datasets across audio, satellite imagery, EEG, HAR, and counts, and by providing standardized cross-validation splits and data formats, MONSTER enables robust evaluation of scalability and computational efficiency. The study finds that low-bias models can outperform on larger data, while traditional small-data leaders may falter as dataset size grows; it also reveals varied category-specific strengths, with some methods excelling on audio and others on satellite or HAR tasks. The benchmark aims to catalyze research that emphasizes learning from large quantities of data and practical deployability, potentially reshaping method development toward scalable, domain-aware approaches.

Abstract

We introduce MONSTER-the MONash Scalable Time Series Evaluation Repository-a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.

MONSTER: Monash Scalable Time Series Evaluation Repository

TL;DR

MONSTER introduces a large-scale, multi-domain benchmark for time series classification to address the skew toward small datasets in existing benchmarks. By aggregating 29 datasets across audio, satellite imagery, EEG, HAR, and counts, and by providing standardized cross-validation splits and data formats, MONSTER enables robust evaluation of scalability and computational efficiency. The study finds that low-bias models can outperform on larger data, while traditional small-data leaders may falter as dataset size grows; it also reveals varied category-specific strengths, with some methods excelling on audio and others on satellite or HAR tasks. The benchmark aims to catalyze research that emphasizes learning from large quantities of data and practical deployability, potentially reshaping method development toward scalable, domain-aware approaches.

Abstract

We introduce MONSTER-the MONash Scalable Time Series Evaluation Repository-a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.

Paper Structure

This paper contains 52 sections, 38 figures, 3 tables.

Figures (38)

  • Figure 1: Learning curves for a low variance model vs a low bias model on S2Agri-10pc-17.
  • Figure 2: Class distributions for the audio datasets.
  • Figure 3: Class distributions for the satellite datasets.
  • Figure 4: Map of France showing the location of the Sentinel-2 tile used in the S2Agri dataset.
  • Figure 5: Location of Sentinel-2 tiles and class counts for the TimeSen2Crop dataset
  • ...and 33 more figures