RepCNN: Micro-sized, Mighty Models for Wakeword Detection

Arnav Kundu; Prateeth Nayak; Priyanka Padmanabhan; Devang Naik

RepCNN: Micro-sized, Mighty Models for Wakeword Detection

Arnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik

TL;DR

The paper tackles efficient, always-on wake-word detection under tight memory and compute constraints. It introduces RepCNN, a multi-branch training architecture whose parameters are reparameterized into a single-branch inference graph through kernel fusion, enabling low-latency, memory-efficient deployment. The approach yields up to 43% accuracy gains over a vanilla single-branch CNN and matches or exceeds BC-ResNet accuracy while halving peak memory and achieving an order-of-magnitude speedup on device inference, demonstrated on both wake-word tasks and Google Speech Commands. These results establish structural re-parameterization as a practical method for combining training-time capacity with inference-time efficiency in streaming speech models, with potential for broader mobile applications.

Abstract

Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.

RepCNN: Micro-sized, Mighty Models for Wakeword Detection

TL;DR

Abstract

Paper Structure (12 sections, 2 equations, 5 figures, 3 tables)

This paper contains 12 sections, 2 equations, 5 figures, 3 tables.

Introduction
Related Works
Model Architecture
RepConvBlock
Reparameterization
RepCNN
Experiments and Results
Datasets
Training and Evaluation
Results
Ablation study
Conclusion

Figures (5)

Figure 1: RepConvBlock: Re-parameterizable Convolutional Block
Figure 2: RepCNN training architecture. The RepConvBlocks are parameterized as (input-channels, output-channels, kernel-size, num-branches). Here p=q=2 i.e., block1 and block2 are stacked twice. $block1_1$, $block1_2$, $block2_1$, $block2_2$ use kernel-size as $k=7,9,11,13$ respectively. non linearity = ReLU.
Figure 3: DET curves for model architectures as compared to RepCNN (Orange). We choose 3FA/hr as an ideal operating point for the detection.
Figure 4: Plot of training and validation loss of RepCNN as compared to a no branch architecture.
Figure 5: Plot of validation loss of RepCNN with various degrees of over-parameterization. Branches corresponds to the number of parallel 1D depth-wise convolutional kernels apart from the kernel with size 1.

RepCNN: Micro-sized, Mighty Models for Wakeword Detection

TL;DR

Abstract

RepCNN: Micro-sized, Mighty Models for Wakeword Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)