Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

Hossam M. Zawbaa; Wael Rashwan; Sourav Dutta; Haytham Assem

Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

Hossam M. Zawbaa, Wael Rashwan, Sourav Dutta, Haytham Assem

TL;DR

DETER addresses out-of-scope intent detection in task-oriented dialogue by integrating two complementary text encoders, USE and TSDAE, into a dual-branch architecture that yields a unified representation $h(u) \in \mathbb{R}^{1280}$. The method adds synthetic outliers via convex combinations in embedding space and open-domain outliers from SQuAD 2.0, training a $K+1$-way classifier with a threshold-based re-classification mechanism that calibrates on validation data. With only $1{,}559{,}808$ trainable parameters, DETER outperforms TEXTOIR baselines across CLINC-150, Banking77, and Stackoverflow, achieving substantial improvements in macro F1 for both known and unknown intents (e.g., up to $13\%$ and $5\%$ on CLINC-150/Stackoverflow, and $16\%$ and $24\%$ on Banking77). The approach demonstrates the practical value of combining dual embeddings, synthetic/outlier generation, and a calibrated confidence threshold for robust, scalable OOS detection in real-world dialogue systems.

Abstract

Detecting out-of-scope user utterances is essential for task-oriented dialogues and intent classification. Current methodologies face difficulties with the unpredictable distribution of outliers and often rely on assumptions about data distributions. We present the Dual Encoder for Threshold-Based Re-Classification (DETER) to address these challenges. This end-to-end framework efficiently detects out-of-scope intents without requiring assumptions on data distributions or additional post-processing steps. The core of DETER utilizes dual text encoders, the Universal Sentence Encoder (USE) and the Transformer-based Denoising AutoEncoder (TSDAE), to generate user utterance embeddings, which are classified through a branched neural architecture. Further, DETER generates synthetic outliers using self-supervision and incorporates out-of-scope phrases from open-domain datasets. This approach ensures a comprehensive training set for out-of-scope detection. Additionally, a threshold-based re-classification mechanism refines the model's initial predictions. Evaluations on the CLINC-150, Stackoverflow, and Banking77 datasets demonstrate DETER's efficacy. Our model outperforms previous benchmarks, increasing up to 13% and 5% in F1 score for known and unknown intents on CLINC-150 and Stackoverflow, and 16% for known and 24% % for unknown intents on Banking77. The source code has been released at https://github.com/Hossam-Mohammed-tech/Intent_Classification_OOS.

Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

TL;DR

. The method adds synthetic outliers via convex combinations in embedding space and open-domain outliers from SQuAD 2.0, training a

-way classifier with a threshold-based re-classification mechanism that calibrates on validation data. With only

trainable parameters, DETER outperforms TEXTOIR baselines across CLINC-150, Banking77, and Stackoverflow, achieving substantial improvements in macro F1 for both known and unknown intents (e.g., up to

and

on CLINC-150/Stackoverflow, and

and

on Banking77). The approach demonstrates the practical value of combining dual embeddings, synthetic/outlier generation, and a calibrated confidence threshold for robust, scalable OOS detection in real-world dialogue systems.

Abstract

Paper Structure (20 sections, 7 equations, 3 figures, 3 tables)

This paper contains 20 sections, 7 equations, 3 figures, 3 tables.

Introduction
Related Literature
Dual Encoder for Threshold-based Re-Classification (DETER)
Universal Sentence Encoder (USE)
Transformer-based Denoising AutoEncoder (TSDAE)
Representation Learning
Construction of Outliers
Synthetic Outliers
Open-Domain Outliers
DETER Training Architecture
Re-Classification Threshold
Experimental Setup
Dataset
Model Hyper-parameters
Evaluation Results
...and 5 more sections

Figures (3)

Figure 1: Overview of the proposed Dual Encoder for Threshold-based Re-Classification (DETER).
Figure 2: The model's architecture
Figure 3: Performance comparison of the "Model only" versus the "Model with threshold (DETER)" on CLINC-150 dataset for both (a) known and (b) unknown intents across varying intent ratios . The error bars display the standard deviation across ten runs.

Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

TL;DR

Abstract

Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (3)