Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems
Mansi Rana, Kadri Hacioglu, Sindhuja Gopalan, Maragathamani Boothalingam
TL;DR
This paper tackles zero-shot slot filling in dialogue by combining slot induction with black-box knowledge distillation from a large teacher LLM to a compact student model, optimizing for domain-generalization in conversational data. The authors introduce a two-stage architecture that uses GLiNER as a preprocessing extractor and constraint-based postprocessing to achieve near real-time inference. They collect diverse anonymized transcripts, generate an instruction-finetuning dataset, and evaluate across multi-domain and unseen-domain settings, reporting a 26% absolute improvement in F1 over vanilla LLMs and a 34% relative improvement over a legacy extractive baseline. They also demonstrate generalization to new domains and explore language expansion and smaller models, highlighting practical deployment implications for call-center settings. The work contributes a scalable, real-world pipeline for robust slot filling in conversational AI.
Abstract
Zero-shot slot filling is a well-established subtask of Natural Language Understanding (NLU). However, most existing methods primarily focus on single-turn text data, overlooking the unique complexities of conversational dialogue. Conversational data is highly dynamic, often involving abrupt topic shifts, interruptions, and implicit references that make it difficult to directly apply zero-shot slot filling techniques, even with the remarkable capabilities of large language models (LLMs). This paper addresses these challenges by proposing strategies for automatic data annotation with slot induction and black-box knowledge distillation (KD) from a teacher LLM to a smaller model, outperforming vanilla LLMs on internal datasets by 26% absolute increase in F1 score. Additionally, we introduce an efficient system architecture for call center product settings that surpasses off-the-shelf extractive models by 34% relative F1 score, enabling near real-time inference on dialogue streams with higher accuracy, while preserving low latency.
