Table of Contents
Fetching ...

DDFAD: Dataset Distillation Framework for Audio Data

Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu

TL;DR

DDFAD introduces the first dataset distillation framework for audio data, addressing the substantial storage and compute demands of large-scale audio datasets. It combines a novel FD-MFCC feature representation with matching training trajectory (MTT) distillation and Griffin-Lim-based audio reconstruction to produce a compact distilled dataset that preserves classification performance across multiple architectures. The work demonstrates strong cross-architecture generalization, effective ablations showing the superiority of FD-MFCC over traditional features, and tangible benefits for downstream tasks such as continual learning and neural architecture search. The approach significantly reduces data and compute requirements while maintaining competitive accuracy, enabling scalable audio-model development and broad practical impact.

Abstract

Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.

DDFAD: Dataset Distillation Framework for Audio Data

TL;DR

DDFAD introduces the first dataset distillation framework for audio data, addressing the substantial storage and compute demands of large-scale audio datasets. It combines a novel FD-MFCC feature representation with matching training trajectory (MTT) distillation and Griffin-Lim-based audio reconstruction to produce a compact distilled dataset that preserves classification performance across multiple architectures. The work demonstrates strong cross-architecture generalization, effective ablations showing the superiority of FD-MFCC over traditional features, and tangible benefits for downstream tasks such as continual learning and neural architecture search. The approach significantly reduces data and compute requirements while maintaining competitive accuracy, enabling scalable audio-model development and broad practical impact.

Abstract

Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.
Paper Structure (25 sections, 4 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 25 sections, 4 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Dataset distillation for image data.
  • Figure 2: Dataset distillation for audio data.
  • Figure 3: The workflow of the proposed dataset distillation framework for audio data.
  • Figure 4: The feature extraction process of FD-MFCC.
  • Figure 5: Ablation Study of FD-MFCC.
  • ...and 3 more figures