Enabling On-device Continual Learning with Binary Neural Networks
Lorenzo Vorabbi, Davide Maltoni, Guido Borghi, Stefano Santi
TL;DR
This work tackles on-device continual learning under severe memory and compute constraints by combining Binary Neural Networks (BNNs) with Latent Replay and a quantized backpropagation framework. It introduces dual bitwidth training with forward $q_f$ and backward $q_b$ quantization, enabling efficient updates to both convolutional layers and the classifier head while storing 1-bit latent activations in replay memories. Key contributions include reduced replay memory (1-bit activations), improved accuracy over prior BNN-CWR* baselines, quantized backpropagation for non-binary layers, and optimized binary weight quantization yielding substantial memory savings and practical edge-device efficiency. The approach achieves memory reductions up to $32\times$, speedups up to $2.2\times$ on embedded hardware, and demonstrates feasibility for scalable on-device continual learning in TinyML applications, with future work targeting ARM NEON optimization.
Abstract
On-device learning remains a formidable challenge, especially when dealing with resource-constrained devices that have limited computational capabilities. This challenge is primarily rooted in two key issues: first, the memory available on embedded devices is typically insufficient to accommodate the memory-intensive back-propagation algorithm, which often relies on floating-point precision. Second, the development of learning algorithms on models with extreme quantization levels, such as Binary Neural Networks (BNNs), is critical due to the drastic reduction in bit representation. In this study, we propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks to enable on-device training while maintaining competitive performance. Specifically, our approach leverages binary latent replay (LR) activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation. The experimental validation demonstrates a significant accuracy improvement in combination with a noticeable reduction in memory requirement, confirming the suitability of our approach in expanding the practical applications of deep learning in real-world scenarios.
