PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
Amir Nassereldine, Dancheng Liu, Chenhui Xu, Ruiyang Qin, Yiyu Shi, Jinjun Xiong
TL;DR
PI-Whisper presents an adaptive edge ASR framework that incrementally adapts to speaker characteristics by learning multiple LoRA profiles and merging them at inference. It employs a lightweight speaker-characteristic classifier to identify attributes and loads corresponding LoRA profiles from dedicated libraries, enabling non intrusive personalization without full model retraining. The approach achieves state-of-the-art WER on the evaluated datasets with up to 13.7% relative improvement and modest overhead, and demonstrates zero-shot transfer and fairness improvements across diverse speaker groups. The work shows practical potential for personalized, privacy-preserving ASR on edge devices.
Abstract
Edge-based automatic speech recognition (ASR) technologies are increasingly prevalent in the development of intelligent and personalized assistants. However, resource-constrained ASR models face significant challenges in adaptivity, incrementality, and inclusivity when faced with a diverse population. To tackle those challenges, we propose PI-Whisper, a novel ASR system that adaptively enhances recognition capabilities by identifying speakers' characteristics in real-time. In this work, we show how the design of PI-Whisper allows for incremental adaptation of new characteristics without the need for repetitive retraining, enhances recognition capabilities, and improves equity and fairness across diverse speaker groups. PI-Whisper demonstrates these advantages by achieving state-of-the-art accuracy, reducing the word error rate (WER) by up to 13.7% relative to baselines while scaling linearly to computing resources.
