Table of Contents
Fetching ...

An Active Learning Framework with a Class Balancing Strategy for Time Series Classification

Shemonto Das

TL;DR

This work addresses the high labeling costs and class-imbalance challenges in time series classification by introducing an Active Learning framework augmented with a class-balancing instance selection algorithm. It systematically evaluates uncertainty sampling, query-by-committee, and expected model change across tactile texture recognition and industrial fault detection, using sliding-window temporal features and the $f_1$-score as the evaluation metric. In tactile texture recognition, a 6-second window with 50 percent overlap and Extra Trees trained under UNC achieved about $90.25$ percent $f_1$-score with only a subset of the data labeled (~70 percent of the total data), while in synthetic fiber manufacturing, AL with balancing reduced labeling to roughly 21 percent and, with XGBoost and QBC, reached about $69.24$ percent $f_1$-score at full budget. Overall, the framework demonstrates meaningful reductions in annotation cost while maintaining or improving performance, and its modular design suggests applicability to other time-series domains with imbalanced data.

Abstract

Training machine learning models for classification tasks often requires labeling numerous samples, which is costly and time-consuming, especially in time series analysis. This research investigates Active Learning (AL) strategies to reduce the amount of labeled data needed for effective time series classification. Traditional AL techniques cannot control the selection of instances per class for labeling, leading to potential bias in classification performance and instance selection, particularly in imbalanced time series datasets. To address this, we propose a novel class-balancing instance selection algorithm integrated with standard AL strategies. Our approach aims to select more instances from classes with fewer labeled examples, thereby addressing imbalance in time series datasets. We demonstrate the effectiveness of our AL framework in selecting informative data samples for two distinct domains of tactile texture recognition and industrial fault detection. In robotics, our method achieves high-performance texture categorization while significantly reducing labeled training data requirements to 70%. We also evaluate the impact of different sliding window time intervals on robotic texture classification using AL strategies. In synthetic fiber manufacturing, we adapt AL techniques to address the challenge of fault classification, aiming to minimize data annotation cost and time for industries. We also address real-life class imbalances in the multiclass industrial anomalous dataset using our class-balancing instance algorithm integrated with AL strategies. Overall, this thesis highlights the potential of our AL framework across these two distinct domains.

An Active Learning Framework with a Class Balancing Strategy for Time Series Classification

TL;DR

This work addresses the high labeling costs and class-imbalance challenges in time series classification by introducing an Active Learning framework augmented with a class-balancing instance selection algorithm. It systematically evaluates uncertainty sampling, query-by-committee, and expected model change across tactile texture recognition and industrial fault detection, using sliding-window temporal features and the -score as the evaluation metric. In tactile texture recognition, a 6-second window with 50 percent overlap and Extra Trees trained under UNC achieved about percent -score with only a subset of the data labeled (~70 percent of the total data), while in synthetic fiber manufacturing, AL with balancing reduced labeling to roughly 21 percent and, with XGBoost and QBC, reached about percent -score at full budget. Overall, the framework demonstrates meaningful reductions in annotation cost while maintaining or improving performance, and its modular design suggests applicability to other time-series domains with imbalanced data.

Abstract

Training machine learning models for classification tasks often requires labeling numerous samples, which is costly and time-consuming, especially in time series analysis. This research investigates Active Learning (AL) strategies to reduce the amount of labeled data needed for effective time series classification. Traditional AL techniques cannot control the selection of instances per class for labeling, leading to potential bias in classification performance and instance selection, particularly in imbalanced time series datasets. To address this, we propose a novel class-balancing instance selection algorithm integrated with standard AL strategies. Our approach aims to select more instances from classes with fewer labeled examples, thereby addressing imbalance in time series datasets. We demonstrate the effectiveness of our AL framework in selecting informative data samples for two distinct domains of tactile texture recognition and industrial fault detection. In robotics, our method achieves high-performance texture categorization while significantly reducing labeled training data requirements to 70%. We also evaluate the impact of different sliding window time intervals on robotic texture classification using AL strategies. In synthetic fiber manufacturing, we adapt AL techniques to address the challenge of fault classification, aiming to minimize data annotation cost and time for industries. We also address real-life class imbalances in the multiclass industrial anomalous dataset using our class-balancing instance algorithm integrated with AL strategies. Overall, this thesis highlights the potential of our AL framework across these two distinct domains.
Paper Structure (38 sections, 5 equations, 16 figures, 7 tables, 2 algorithms)

This paper contains 38 sections, 5 equations, 16 figures, 7 tables, 2 algorithms.

Figures (16)

  • Figure 1: The pipeline for tactile data used for texture classification. 1) Data collection from exploratory movements; 2) time series data is partitioned into temporal; 3) statistical attributes extraction; 4) using AL strategies to rank instances; 5) the AL strategy selects top-ranked instances; 6) Machine-learning model built with the instances in the labeled pool; 7) classify all instances in the processed tactile data pool
  • Figure 2: Multi-modal bio-inspired sensor and MARG frames of reference lima2021classification.
  • Figure 3: XY-recorder setup with a texture under exploration lima2021classification.
  • Figure 4: The set of textures explored in the dataset lima2021classification.
  • Figure 5: Window-based statistical features extraction for the 6 seconds with 3 seconds overlap.
  • ...and 11 more figures