Table of Contents
Fetching ...

The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification

Urav Maniar

TL;DR

This study tackles the scalability challenge in time series classification by testing whether targeted ensembles of two efficient, cross-paradigm methods—Hydra and Quant—can match the benefits of comprehensive ensembles like HIVE-COTE with practical training times on large MONSTER datasets. It systematically analyzes feature- and prediction-level complementarity, and evaluates six ensemble configurations including feature concatenation, asymmetric/symmetric stacking, and CAWPE weighting. The strongest configuration achieves a modest gain (0.72% on average) but captures only about 11% of the theoretical oracle improvement, revealing a substantial meta-learning optimization gap. The findings suggest that future progress hinges on richer meta-features, more powerful meta-learners, and possibly multi-level ensemble strategies to fully exploit cross-paradigm complementarity in large-scale time series tasks.

Abstract

Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, their 340-hour training time on the UCR benchmark renders them impractical for large-scale datasets. We investigate whether targeted combinations of two efficient algorithms from complementary paradigms can capture ensemble benefits while maintaining computational feasibility. Combining Hydra (competing convolutional kernels) and Quant (hierarchical interval quantiles) across six ensemble configurations, we evaluate performance on 10 large-scale MONSTER datasets (7,898 to 1,168,774 training instances). Our strongest configuration improves mean accuracy from 0.829 to 0.836, succeeding on 7 of 10 datasets. However, prediction-combination ensembles capture only 11% of theoretical oracle potential, revealing a substantial meta-learning optimization gap. Feature-concatenation approaches exceeded oracle bounds by learning novel decision boundaries, while prediction-level complementarity shows moderate correlation with ensemble gains. The central finding: the challenge has shifted from ensuring algorithms are different to learning how to combine them effectively. Current meta-learning strategies struggle to exploit the complementarity that oracle analysis confirms exists. Improved combination strategies could potentially double or triple ensemble gains across diverse time series classification applications.

The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification

TL;DR

This study tackles the scalability challenge in time series classification by testing whether targeted ensembles of two efficient, cross-paradigm methods—Hydra and Quant—can match the benefits of comprehensive ensembles like HIVE-COTE with practical training times on large MONSTER datasets. It systematically analyzes feature- and prediction-level complementarity, and evaluates six ensemble configurations including feature concatenation, asymmetric/symmetric stacking, and CAWPE weighting. The strongest configuration achieves a modest gain (0.72% on average) but captures only about 11% of the theoretical oracle improvement, revealing a substantial meta-learning optimization gap. The findings suggest that future progress hinges on richer meta-features, more powerful meta-learners, and possibly multi-level ensemble strategies to fully exploit cross-paradigm complementarity in large-scale time series tasks.

Abstract

Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, their 340-hour training time on the UCR benchmark renders them impractical for large-scale datasets. We investigate whether targeted combinations of two efficient algorithms from complementary paradigms can capture ensemble benefits while maintaining computational feasibility. Combining Hydra (competing convolutional kernels) and Quant (hierarchical interval quantiles) across six ensemble configurations, we evaluate performance on 10 large-scale MONSTER datasets (7,898 to 1,168,774 training instances). Our strongest configuration improves mean accuracy from 0.829 to 0.836, succeeding on 7 of 10 datasets. However, prediction-combination ensembles capture only 11% of theoretical oracle potential, revealing a substantial meta-learning optimization gap. Feature-concatenation approaches exceeded oracle bounds by learning novel decision boundaries, while prediction-level complementarity shows moderate correlation with ensemble gains. The central finding: the challenge has shifted from ensuring algorithms are different to learning how to combine them effectively. Current meta-learning strategies struggle to exploit the complementarity that oracle analysis confirms exists. Improved combination strategies could potentially double or triple ensemble gains across diverse time series classification applications.

Paper Structure

This paper contains 53 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Complementarity between Hydra and Quant across 10 MONSTER datasets. (A) Per-dataset accuracy comparison (univariate: blue, multivariate: orange). (B) Oracle ensemble potential (theoretical upper bound for prediction-combination strategies). Green: high potential ($>5\%$), orange: moderate ($2$--$5\%$), red: low ($<2\%$). Feature-concatenation ensembles can exceed these bounds.
  • Figure 2: Ensemble performance analysis across 10 MONSTER datasets. (A) Classifier comparison for feature concatenation strategy: Ridge (red) vs ExtraTrees (green) meta-learners across 9 datasets. ExtraTrees consistently outperforms Ridge, demonstrating the importance of non-linear meta-learning for heterogeneous feature spaces. (B) Top three ensemble performance gains relative to best base algorithm. QFeat-HLogit-ET (primary ensemble), Simplified CAWPE (weighted averaging), and DualOOF-ET (symmetric stacking) show varying effectiveness across datasets, with largest improvements on USCActivity, WISDM, and InsectSound.
  • Figure 3: Predictors of ensemble success for QFeat-HLogit-ET across 10 MONSTER datasets. (A) Oracle gain vs ensemble gain ($r=0.631$, $p=0.050$). (B) Feature complementarity vs ensemble gain ($r=0.328$, $p=0.36$).
  • Figure D: Complete algorithm comparison across 9 MONSTER datasets (Traffic excluded). Colors indicate performance relative to best base algorithm per dataset. Green cells show improvements, red shows degradation, yellow shows comparable performance. No single ensemble dominates across all datasets, validating the need for problem-specific ensemble selection.
  • Figure E: Computational cost analysis. (A) Training time distribution per algorithm showing median (orange line) and quartiles. Ensembles with OOF prediction generation (QFeat-HLogit-ET, CAWPE, DualOOF-ET) incur 5-fold cross-validation overhead. (B) Accuracy vs normalized computational cost (time per 1000 training samples). Normalization accounts for dataset size differences. Quant and Hydra-Multi offer best efficiency, while ensembles trade computational cost for marginal accuracy gains.