RepCNN: Micro-sized, Mighty Models for Wakeword Detection
Arnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik
TL;DR
The paper tackles efficient, always-on wake-word detection under tight memory and compute constraints. It introduces RepCNN, a multi-branch training architecture whose parameters are reparameterized into a single-branch inference graph through kernel fusion, enabling low-latency, memory-efficient deployment. The approach yields up to 43% accuracy gains over a vanilla single-branch CNN and matches or exceeds BC-ResNet accuracy while halving peak memory and achieving an order-of-magnitude speedup on device inference, demonstrated on both wake-word tasks and Google Speech Commands. These results establish structural re-parameterization as a practical method for combining training-time capacity with inference-time efficiency in streaming speech models, with potential for broader mobile applications.
Abstract
Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.
