IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Ankan Mullick; Sukannya Purkayastha; Saransh Sharma; Pawan Goyal; Niloy Ganguly

IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

Ankan Mullick, Sukannya Purkayastha, Saransh Sharma, Pawan Goyal, Niloy Ganguly

TL;DR

IDALC introduces a semi-supervised framework that fuses Intent Detection with Active Learning based Correction to handle rejected utterances and discover new intents with minimal annotation. It uses a two-stage architecture (ID and ALC) with a DOC-based out-of-domain detector and a majority-voting auto-correction module, enabling iterative retraining on expanded labeled data. Experiments on SNIPS, Facebook multilingual, and ATIS show consistent gains of $5-10%$ in accuracy and $4-8%$ in macro-F1, while annotation costs remain at $6-10%$ of the unlabeled pool. The approach is lightweight, language-agnostic, and suitable for real-time or edge deployment, offering a practical path to scalable, adaptive dialog systems with reduced labeling overhead.

Abstract

Voice-controlled dialog systems have become immensely popular due to their ability to perform a wide range of actions in response to diverse user queries. These agents possess a predefined set of skills or intents to fulfill specific user tasks. But every system has its own limitations. There are instances where, even for known intents, if any model exhibits low confidence, it results in rejection of utterances that necessitate manual annotation. Additionally, as time progresses, there may be a need to retrain these agents with new intents from the system-rejected queries to carry out additional tasks. Labeling all these emerging intents and rejected utterances over time is impractical, thus calling for an efficient mechanism to reduce annotation costs. In this paper, we introduce IDALC (Intent Detection and Active Learning based Correction), a semi-supervised framework designed to detect user intents and rectify system-rejected utterances while minimizing the need for human annotation. Empirical findings on various benchmark datasets demonstrate that our system surpasses baseline methods, achieving a 5-10% higher accuracy and a 4-8% improvement in macro-F1. Remarkably, we maintain the overall annotation cost at just 6-10% of the unlabelled data available to the system. The overall framework of IDALC is shown in Fig. 1

IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

TL;DR

Abstract

IDALC: A Semi-Supervised Framework for Intent Detection and Active Learning based Correction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)