Knowledge boosting during low-latency inference
Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota
TL;DR
Knowledge boosting addresses enabling a remote large model to boost the inference of a small on-device model under strict latency by delivering delayed hints. The approach formalizes a two-model setup where the small model uses current input chunks while receiving delayed embeddings from the large model through a communication channel, merged into the small model with a context window and cross-attention. Empirical results on three binaural speech tasks with a 48 ms delay show SI-SDR gains of 0.23, 2.31, and 3.53 dB over baselines, with MACs reduced by about 1–2 M. The findings indicate that larger gains occur when the large-small gap is wide, suggesting practical viability for edge devices requiring real-time audio processing and offering directions for compression and architecture improvements.
Abstract
Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. Code, dataset, and audio samples available at https://knowledgeboosting.cs.washington.edu/.
