Table of Contents
Fetching ...

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Harlin Lee, Aaqib Saeed, Andrea L. Bertozzi

TL;DR

This work tackles data- and label-efficiency for non-semantic speech classification by integrating pretrained self-supervised representations with active learning. It introduces ALOE, a system that keeps a fixed pretrained encoder and trains a lightweight linear probe, using smallest-margin uncertainty sampling to acquire informative labels in a pool-based AL loop. Across five datasets and multiple architectures, ALOE achieves near-upper-bound performance of fully labeled baselines while using substantially fewer labels, highlighting strong data- and label-efficiency with practical deployment implications. The approach offers a scalable, simple solution and points to extensions such as graph-based AL and larger-scale evaluation like AudioSet.

Abstract

Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning. ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data.

Active Learning of Non-semantic Speech Tasks with Pretrained Models

TL;DR

This work tackles data- and label-efficiency for non-semantic speech classification by integrating pretrained self-supervised representations with active learning. It introduces ALOE, a system that keeps a fixed pretrained encoder and trains a lightweight linear probe, using smallest-margin uncertainty sampling to acquire informative labels in a pool-based AL loop. Across five datasets and multiple architectures, ALOE achieves near-upper-bound performance of fully labeled baselines while using substantially fewer labels, highlighting strong data- and label-efficiency with practical deployment implications. The approach offers a scalable, simple solution and points to extensions such as graph-based AL and larger-scale evaluation like AudioSet.

Abstract

Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning. ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data.
Paper Structure (5 sections, 2 equations, 3 figures, 2 tables)

This paper contains 5 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of ALOE: active learning with a pretrained model. The encoder parameters are frozen, allowing the use of the same encoder across multiple downstream tasks.
  • Figure 2: Uncertainty sampling outperforms random sampling at every AL round, as measured by validation set accuracy (%).
  • Figure 3: Different uncertainty sampling methods paired with pretrained model perform similarly well on validation set.