Dynamic Switch Layers For Unsupervised Learning
Haiguang Li, Usama Pervaiz, Michał Matuszak, Robert Kamara, Gilles Roux, Trausti Thormundsson, Joseph Antognini
TL;DR
The Dynamic Switch Layer (DSL) addresses the power/compute bottleneck of on-device learning by enabling unsupervised, data-driven path routing through a lightweight decoder. By generalizing Gated Compression layers, the DSL induces activation sparsity and dynamic routing, maintaining accuracy while dramatically reducing model size and computation. Integrated into SoundStream, the DSL routes up to 80% of samples to a lightweight path, achieving a 20.9x reduction in parameters and a 12.3x reduction in compute, with up to 26.5% lower latency and 21.4% better power efficiency, without sacrificing downstream performance. These results demonstrate practical impact for real-world, energy-constrained applications and establish the DSL as a versatile building block for efficient on-device unsupervised learning.
Abstract
On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. However, power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency that often limits model complexity. The previously established Gated Compression (GC) layers offer a solution, enabling power efficiency without sacrificing model performance by selectively gating samples that lack signals of interest. However, their reliance on ground truth labels limits GC layers to supervised tasks. This work introduces the Dynamic Switch Layer (DSL), extending the benefits of GC layers to unsupervised learning scenarios, and maintaining power efficiency without the need for labeled data. The DSL builds upon the GC architecture, leveraging a dynamic pathway selection, and adapting model complexity in response to the innate structure of the data. We integrate the DSL into the SoundStream architecture and demonstrate that by routing up to 80% of samples through a lightweight pass we achieve a 12.3x reduction in the amount of computation performed and a 20.9x reduction in model size. This reduces the on-device inference latency by up to 26.5% and improves power efficiency by up to 21.4% without impacting model performance.
