Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
Miika Toikkanen, June-Woo Kim
TL;DR
This work tackles data scarcity in respiratory sound classification (RSC) and the high compute cost of ensemble methods by introducing architecture-agnostic soft-label knowledge distillation. A BTS-based teacher ensemble is used to generate probabilistic targets, which are then used to train lightweight student models, achieving state-of-the-art performance on the ICBHI dataset. The study demonstrates that even a single teacher can substantially improve a student, with additional gains from multiple teachers and a second-generation distilled ensemble. The approach proves effective across architectures, reduces inference cost compared to full ensembles, and is accompanied by release of reproducible code for further research.
Abstract
Respiratory sound datasets are limited in size and quality, making high performance difficult to achieve. Ensemble models help but inevitably increase compute cost at inference time. Soft label training distills knowledge efficiently with extra cost only at training. In this study, we explore soft labels for respiratory sound classification as an architecture-agnostic approach to distill an ensemble of teacher models into a student model. We examine different variations of our approach and find that even a single teacher, identical to the student, considerably improves performance beyond its own capability, with optimal gains achieved using only a few teachers. We achieve the new state-of-the-art Score of 64.39 on ICHBI, surpassing the previous best by 0.85 and improving average Scores across architectures by more than 1.16. Our results highlight the effectiveness of knowledge distillation with soft labels for respiratory sound classification, regardless of size or architecture.
