A dataset and model for recognition of audiologically relevant environments for hearing aids: AHEAD-DS and YAMNet+

Henry Zhong; Jörg M. Buchholz; Julian Maclaren; Simon Carlile; Richard Lyon

A dataset and model for recognition of audiologically relevant environments for hearing aids: AHEAD-DS and YAMNet+

Henry Zhong, Jörg M. Buchholz, Julian Maclaren, Simon Carlile, Richard Lyon

TL;DR

This work tackles the lack of public, standardized benchmarks for audiologically relevant scene recognition in hearing devices and the challenge of deploying models on edge hardware. It introduces AHEAD-DS, a ready-to-use dataset with 14 clinically relevant labels derived from HEAR-DS and CHiME-6 Dev, and YAMNet+, a lightweight, edge-friendly sound recognition model trained with transfer learning from AudioSet. On the AHEAD-DS test set, YAMNet+ achieves a mean average precision of 0.83 and an accuracy of 0.93, with real-time inference demonstrated on a Google Pixel 3 (approximately 50 ms to load the model and ~30 ms per additional second). The combination provides a publicly accessible benchmark and an open-source, deployable baseline workflow to accelerate research and deployment of hearing-device scene recognition.

Abstract

Scene recognition of audiologically relevant environments is important for hearing aids; however, it is challenging, in part because of the limitations of existing datasets. Datasets often lack public accessibility, completeness, or audiologically relevant labels, hindering systematic comparison of machine learning models. Deploying these models on resource-constrained edge devices presents another challenge. Our solution is two-fold: we leverage several open source datasets to create AHEAD-DS, a dataset designed for scene recognition of audiologically relevant environments, and introduce YAMNet+, a sound recognition model. AHEAD-DS aims to provide a standardised, publicly available dataset with consistent labels relevant to hearing aids, facilitating model comparison. YAMNet+ is designed for deployment on edge devices like smartphones connected to hearing devices, such as hearing aids and wireless earphones with hearing aid functionality; serving as a baseline model for sound-based scene recognition. YAMNet+ achieved a mean average precision of 0.83 and accuracy of 0.93 on the testing set of AHEAD-DS across fourteen categories of audiologically relevant environments. We found that applying transfer learning from the pretrained YAMNet model was essential. We demonstrated real-time sound-based scene recognition capabilities on edge devices by deploying YAMNet+ to an Android smartphone. Even with a Google Pixel 3 (a phone with modest specifications, released in 2018), the model processes audio with approximately 50ms of latency to load the model, and an approximate linear increase of 30ms per 1 second of audio. Our website and code https://github.com/Australian-Future-Hearing-Initiative .

A dataset and model for recognition of audiologically relevant environments for hearing aids: AHEAD-DS and YAMNet+

TL;DR

Abstract

A dataset and model for recognition of audiologically relevant environments for hearing aids: AHEAD-DS and YAMNet+

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)