RoNID: New Intent Discovery with Generated-Reliable Labels and Cluster-friendly Representations
Shun Zhang, Chaoran Yan, Jian Yang, Changyu Ren, Jiaqi Bai, Tongliang Li, Zhoujun Li
TL;DR
RoNID tackles open-world New Intent Discovery by coupling reliable pseudo-label generation via an optimal transport (OT) formulation with EM-style optimization and cluster-friendly representation learning through intra- and inter-cluster contrastive objectives. The method iteratively refines pseudo-labels and representations, breaking the negative feedback loop between labeling accuracy and representation quality. Experimental results on three benchmarks show RoNID achieving state-of-the-art performance with robust gains across ACC, NMI, and ARI and strong robustness to varying known-class ratios. This work provides a principled framework for discovering novel intents while preserving known ones, with practical implications for improving open-domain dialogue systems.
Abstract
New Intent Discovery (NID) strives to identify known and reasonably deduce novel intent groups in the open-world scenario. But current methods face issues with inaccurate pseudo-labels and poor representation learning, creating a negative feedback loop that degrades overall model performance, including accuracy and the adjusted rand index. To address the aforementioned challenges, we propose a Robust New Intent Discovery (RoNID) framework optimized by an EM-style method, which focuses on constructing reliable pseudo-labels and obtaining cluster-friendly discriminative representations. RoNID comprises two main modules: reliable pseudo-label generation module and cluster-friendly representation learning module. Specifically, the pseudo-label generation module assigns reliable synthetic labels by solving an optimal transport problem in the E-step, which effectively provides high-quality supervised signals for the input of the cluster-friendly representation learning module. To learn cluster-friendly representation with strong intra-cluster compactness and large inter-cluster separation, the representation learning module combines intra-cluster and inter-cluster contrastive learning in the M-step to feed more discriminative features into the generation module. RoNID can be performed iteratively to ultimately yield a robust model with reliable pseudo-labels and cluster-friendly representations. Experimental results on multiple benchmarks demonstrate our method brings substantial improvements over previous state-of-the-art methods by a large margin of +1~+4 points.
