ConvFill: Model Collaboration for Responsive Conversational Voice Agents

Vidya Srinivas; Zachary Englhardt; Maximus Powers; Shwetak Patel; Vikram Iyer

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

Vidya Srinivas, Zachary Englhardt, Maximus Powers, Shwetak Patel, Vikram Iyer

TL;DR

ConvFill introduces conversational infill, a hybrid on-device/off-device architecture that lets a lightweight model respond immediately while a backend LLM streams knowledge chunks to improve responses. The approach decouples latency from capability, achieving sub-200 ms TTFT and notable QA gains on NaturalQuestions (46–52%) though still below backend performance (69–80%). It relies on a synthetic, multi-domain training corpus and a two-thread inference pipeline with a streaming knowledge queue and a filler mechanism to hide latency. This work demonstrates a practical path toward responsive, knowledgeable on-device conversational agents and highlights future directions for grounding and larger on-device models.

Abstract

Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency that disrupts natural conversation, while on-device models respond immediately but lack sophistication. We propose conversational infill, a task where a lightweight on-device model generates contextually appropriate dialogue while seamlessly incorporating streaming knowledge from a powerful backend model. This approach decouples response latency from model capability, enabling systems that feel responsive while accessing the full power of large-scale models. We present ConvFill, a 360M parameter model trained on synthetic multi-domain conversations. Evaluation across multiple backend models shows that conversational infill can be successfully learned, with ConvFill achieving accuracy improvements of 36-42% over standalone small models of the same size while consistently retaining sub-200ms response latencies. Our results demonstrate the promise of this approach for building on-device conversational agents that are both immediately responsive and knowledgeable.

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

TL;DR

Abstract

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)