An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

G P Shrivatsa Bhargav; Sumit Neelam; Udit Sharma; Shajith Ikbal; Dheeraj Sreedhar; Hima Karanam; Sachindra Joshi; Pankaj Dhoolia; Dinesh Garg; Kyle Croutwater; Haode Qi; Eric Wayne; J William Murdock

An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

G P Shrivatsa Bhargav, Sumit Neelam, Udit Sharma, Shajith Ikbal, Dheeraj Sreedhar, Hima Karanam, Sachindra Joshi, Pankaj Dhoolia, Dinesh Garg, Kyle Croutwater, Haode Qi, Eric Wayne, J William Murdock

TL;DR

The paper addresses the challenge of building an industry-grade slot-filling system for dialogue state tracking using relatively small LLMs to meet latency and deployment constraints while achieving zero-shot generalization across domains. It proposes an instruction-tuned fine-tuning approach that uses task-specific data to teach a pre-trained LLM to map conversation history and slot descriptions to slot values, enabling zero-shot deployment through prompts. The authors introduce a comprehensive data preparation strategy, combining SGD data with curated industrial datasets that cover diverse slot types and scenarios (multiple slots per turn, long values, categorical slots, and name/ID/address parsing). Experimental results demonstrate that fine-tuning three LLMs (Flan-T5-XL, Mistral, and granite.13b.v2) with the expanded dataset yields substantial improvements in F1 (relative ~4.2% on average across slot-types) and dramatic latency reductions (~57% vs best prompting baselines) on both held-out data and a realistic in-house benchmark, where granite.13b.v2 achieves Macro F1 ≈ 0.93 with latency ≈ 0.75s. The work suggests that industry-grade DST can be realized with smaller models by leveraging targeted fine-tuning data and adapters, enabling scalable, low-latency, zero-shot slot filling across domains.

Abstract

We present an approach to build Large Language Model (LLM) based slot-filling system to perform Dialogue State Tracking in conversational assistants serving across a wide variety of industry-grade applications. Key requirements of this system include: 1) usage of smaller-sized models to meet low latency requirements and to enable convenient and cost-effective cloud and customer premise deployments, and 2) zero-shot capabilities to serve across a wide variety of domains, slot types and conversational scenarios. We adopt a fine-tuning approach where a pre-trained LLM is fine-tuned into a slot-filling model using task specific data. The fine-tuning data is prepared carefully to cover a wide variety of slot-filling task scenarios that the model is expected to face across various domains. We give details of the data preparation and model building process. We also give a detailed analysis of the results of our experimental evaluations. Results show that our prescribed approach for slot-filling model building has resulted in 6.9% relative improvement of F1 metric over the best baseline on a realistic benchmark, while at the same time reducing the latency by 57%. More over, the data we prepared has helped improve F1 on an average by 4.2% relative across various slot-types.

An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

TL;DR

Abstract

An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

Authors

TL;DR

Abstract

Table of Contents

Figures (4)