Table of Contents
Fetching ...

Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production

Junhua Liu, Yong Keat Tan, Bin Fu, Kwan Hui Lim

TL;DR

This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue systems and proposes C-LARA (Consistency-aware, Linguistics Adaptive Retrieval Augmentation), a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues.

Abstract

Accurate multi-turn intent classification is essential for advancing conversational AI systems. However, challenges such as the scarcity of comprehensive datasets and the complexity of contextual dependencies across dialogue turns hinder progress. This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue systems. First, we introduce Symbol Tuning, which simplifies intent labels to reduce task complexity and improve performance in multi-turn dialogues. Second, we propose C-LARA (Consistency-aware, Linguistics Adaptive Retrieval Augmentation), a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues. These enriched datasets are used to fine-tune a small, efficient model suitable for deployment. Experiments conducted on multilingual dialogue datasets demonstrate significant improvements in classification accuracy and resource efficiency. Our methods enhance multi-turn intent classification accuracy by 5.09%, reduce annotation costs by 40%, and enable scalable deployment in low-resource multilingual industrial systems, highlighting their practicality and impact.

Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production

TL;DR

This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue systems and proposes C-LARA (Consistency-aware, Linguistics Adaptive Retrieval Augmentation), a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues.

Abstract

Accurate multi-turn intent classification is essential for advancing conversational AI systems. However, challenges such as the scarcity of comprehensive datasets and the complexity of contextual dependencies across dialogue turns hinder progress. This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue systems. First, we introduce Symbol Tuning, which simplifies intent labels to reduce task complexity and improve performance in multi-turn dialogues. Second, we propose C-LARA (Consistency-aware, Linguistics Adaptive Retrieval Augmentation), a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues. These enriched datasets are used to fine-tune a small, efficient model suitable for deployment. Experiments conducted on multilingual dialogue datasets demonstrate significant improvements in classification accuracy and resource efficiency. Our methods enhance multi-turn intent classification accuracy by 5.09%, reduce annotation costs by 40%, and enable scalable deployment in low-resource multilingual industrial systems, highlighting their practicality and impact.

Paper Structure

This paper contains 28 sections, 9 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Comparison of instruction tuning and symbol tuning. Simplifying verbose intent labels (e.g., “Request to Cancel Order” → “Cancel Order”) reduces redundancy, enhancing LLM classification performance by 5.09%, addressing key challenges in production intent classification.
  • Figure 2: Annotation pipeline of multi-turn intent classification datasets. Two major challenges in production systems are illustrated: (1) managing numerous (500+) intents across markets with redundant labels, and (2) the high cost of collecting multi-turn training data.
  • Figure 3: Illustration of C-LARA: merging LARA with Self-Consistency effectively combines query aggregation, knowledge base retrieval, and self-consistency mechanism to generate high-quality pseudo-labels for multi-turn dialogues. The self-consistency process improves labeling accuracy by validating intent predictions across different prompt orderings.
  • Figure 4: Online Deployment of Multi-turn Intent Classification model demonstrates our production architecture integrating C-LARA for automated training data generation. The system handles real-time inference while continuously improving through automated training.