Table of Contents
Fetching ...

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari

TL;DR

This paper proposes, for the first time, coupling weight adaptation with architectural adaptation with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS, starting from a state-of-the-art self-learning personalized KWS pipeline.

Abstract

Always-on keyword spotting (KWS) demands on-device adaptation to cope with user- and environment-specific distribution shifts under tight latency and energy budgets. This paper proposes, for the first time, coupling weight adaptation (i.e., on-device training) with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS. Starting from a state-of-the-art self-learning personalized KWS pipeline, we compare data-agnostic and data-aware pruning criteria applied on in-field pseudo-labelled user data. On the HeySnips and HeySnapdragon datasets, we achieve up to 9.63x model-size compression with respect to unpruned baselines at iso-task performance, measured as the accuracy at 0.5 false alarms per hour. When deploying our adaptation pipeline on a Jetson Orin Nano embedded GPU, we achieve up to 1.52x/1.57x and 1.64x/1.77x latency and energy-consumption improvements during online training/inference compared to weights-only adaptation.

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

TL;DR

This paper proposes, for the first time, coupling weight adaptation with architectural adaptation with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS, starting from a state-of-the-art self-learning personalized KWS pipeline.

Abstract

Always-on keyword spotting (KWS) demands on-device adaptation to cope with user- and environment-specific distribution shifts under tight latency and energy budgets. This paper proposes, for the first time, coupling weight adaptation (i.e., on-device training) with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS. Starting from a state-of-the-art self-learning personalized KWS pipeline, we compare data-agnostic and data-aware pruning criteria applied on in-field pseudo-labelled user data. On the HeySnips and HeySnapdragon datasets, we achieve up to 9.63x model-size compression with respect to unpruned baselines at iso-task performance, measured as the accuracy at 0.5 false alarms per hour. When deploying our adaptation pipeline on a Jetson Orin Nano embedded GPU, we achieve up to 1.52x/1.57x and 1.64x/1.77x latency and energy-consumption improvements during online training/inference compared to weights-only adaptation.
Paper Structure (9 sections, 4 equations, 3 figures, 1 table)

This paper contains 9 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Baseline and OnDA pipelines steps.
  • Figure 2: Pareto fronts for personalized KWS under different OnDA pipelines compared with baseline rusci2024self.
  • Figure 3: On-device latency measurements on Jetson Orin Nano for GPU and CPU deployment.