LANID: LLM-assisted New Intent Discovery
Lu Fan, Jiashu Pu, Rongsheng Zhang, Xiao-Ming Wu
TL;DR
New Intent Discovery (NID) in Task-Oriented Dialogue Systems is challenging due to limited semantic representations for novel intents. LANID addresses this by using KNN and DBSCAN-based sampling to select informative utterance pairs, querying an LLM for pairwise relations, and training a lightweight encoder with a contrastive objective. The loss is defined as $L = max(d(x_i,p_i) - d(x_i,n_i) + margin, 0)$ with $d(x,y)=||x-y||$, enabling effective in-domain representation learning and clustering via $k$-means on test data; the approach iterates between sampling, LLM labeling, and finetuning to improve representations. Experiments on BANKING, StackOverflow, and M-CID show LANID surpasses strong unsupervised and semi-supervised baselines across three NID datasets, validating the effectiveness of LLM-guided relational labeling for small encoders. The work offers a scalable, privacy-conscious path for NID by exploiting zero-shot LLM guidance without requiring fine-tuning of large models, and code is available online.
Abstract
Task-oriented Dialogue Systems (TODS) often face the challenge of encountering new intents. New Intent Discovery (NID) is a crucial task that aims to identify these novel intents while maintaining the capability to recognize existing ones. Previous efforts to adapt TODS to new intents have struggled with inadequate semantic representation or have depended on external knowledge, which is often not scalable or flexible. Recently, Large Language Models (LLMs) have demonstrated strong zero-shot capabilities; however, their scale can be impractical for real-world applications that involve extensive queries. To address the limitations of existing NID methods by leveraging LLMs, we propose LANID, a framework that enhances the semantic representation of lightweight NID encoders with the guidance of LLMs. Specifically, LANID employs the $K$-nearest neighbors and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms to sample selective utterance pairs from the training set. It then queries an LLM to ascertain the relationships between these pairs. The data produced from this process is utilized to design a contrastive fine-tuning task, which is then used to train a small encoder with a contrastive triplet loss. Our experimental results demonstrate the efficacy of the proposed method across three distinct NID datasets, surpassing strong baselines in both unsupervised and semi-supervised settings. Our code is available at https://github.com/floatSDSDS/LANID.
