OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

Matteo Risso; Alessio Burrello; Daniele Jahier Pagliari

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari

TL;DR

This paper proposes, for the first time, coupling weight adaptation with architectural adaptation with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS, starting from a state-of-the-art self-learning personalized KWS pipeline.

Abstract

Always-on keyword spotting (KWS) demands on-device adaptation to cope with user- and environment-specific distribution shifts under tight latency and energy budgets. This paper proposes, for the first time, coupling weight adaptation (i.e., on-device training) with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS. Starting from a state-of-the-art self-learning personalized KWS pipeline, we compare data-agnostic and data-aware pruning criteria applied on in-field pseudo-labelled user data. On the HeySnips and HeySnapdragon datasets, we achieve up to 9.63x model-size compression with respect to unpruned baselines at iso-task performance, measured as the accuracy at 0.5 false alarms per hour. When deploying our adaptation pipeline on a Jetson Orin Nano embedded GPU, we achieve up to 1.52x/1.57x and 1.64x/1.77x latency and energy-consumption improvements during online training/inference compared to weights-only adaptation.

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

TL;DR

Abstract

Paper Structure (9 sections, 4 equations, 3 figures, 1 table)

This paper contains 9 sections, 4 equations, 3 figures, 1 table.

Methods
Baseline self-learning pipeline
OnDA pipelines
Pruning strategies
Experimental Results
Setup
OnDA Search Space Exploration
Deployment on Jetson Orin Nano
Conclusions

Figures (3)

Figure 1: Baseline and OnDA pipelines steps.
Figure 2: Pareto fronts for personalized KWS under different OnDA pipelines compared with baseline rusci2024self.
Figure 3: On-device latency measurements on Jetson Orin Nano for GPU and CPU deployment.

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

TL;DR

Abstract

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

Authors

TL;DR

Abstract

Table of Contents

Figures (3)