Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

Bing Han; Wen Huang; Zhengyang Chen; Anbai Jiang; Pingyi Fan; Cheng Lu; Zhiqiang Lv; Jia Liu; Wei-Qiang Zhang; Yanmin Qian

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian

TL;DR

The paper tackles data-efficient, low-complexity acoustic scene classification by introducing Rep-Mobile, a reparameterizable multi-branch CNN architecture, and by leveraging ensemble knowledge distillation with transformer-heavy teachers. It further improves efficiency and accuracy through progressive pruning, enabling staged compression while preserving performance. Experiments on the TAU dataset demonstrate state-of-the-art results and first place in the DCASE2024 Task1 challenge, highlighting practical benefits for deployment on resource-limited devices. The combination of architectural design, distillation strategy, and pruning offers a robust blueprint for accurate ASC under real-world constraints.

Abstract

The goal of the acoustic scene classification (ASC) task is to classify recordings into one of the predefined acoustic scene classes. However, in real-world scenarios, ASC systems often encounter challenges such as recording device mismatch, low-complexity constraints, and the limited availability of labeled data. To alleviate these issues, in this paper, a data-efficient and low-complexity ASC system is built with a new model architecture and better training strategies. Specifically, we firstly design a new low-complexity architecture named Rep-Mobile by integrating multi-convolution branches which can be reparameterized at inference. Compared to other models, it achieves better performance and less computational complexity. Then we apply the knowledge distillation strategy and provide a comparison of the data efficiency of the teacher model with different architectures. Finally, we propose a progressive pruning strategy, which involves pruning the model multiple times in small amounts, resulting in better performance compared to a single step pruning. Experiments are conducted on the TAU dataset. With Rep-Mobile and these training strategies, our proposed ASC system achieves the state-of-the-art (SOTA) results so far, while also winning the first place with a significant advantage over others in the DCASE2024 Challenge.

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

TL;DR

Abstract

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)