Table of Contents
Fetching ...

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

TL;DR

This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and finds that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource.

Abstract

Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, but it is still unclear for their ability on OOD detection task.This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and then outline the strengths and weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource. More deeply, through a series of additional analysis experiments, we discuss and summarize the challenges faced by LLMs and provide guidance for future work including injecting domain knowledge, strengthening knowledge transfer from IND(In-domain) to OOD, and understanding long instructions.

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

TL;DR

This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and finds that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource.

Abstract

Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, but it is still unclear for their ability on OOD detection task.This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and then outline the strengths and weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource. More deeply, through a series of additional analysis experiments, we discuss and summarize the challenges faced by LLMs and provide guidance for future work including injecting domain knowledge, strengthening knowledge transfer from IND(In-domain) to OOD, and understanding long instructions.
Paper Structure (27 sections, 10 figures, 6 tables)

This paper contains 27 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Explanation of the role of OOD intent detection in the TOD system. When the system encounters an intent that is beyond its supported intents, it can detect and friendly prompt the user.
  • Figure 2: Comparison of the OOD detection method between previous method (Upper part) and LLM-based method (Lower part). Previous method trains a feature extractor using IND samples in the first stage, and estimates the confidence score of the sample using the designed scoring function and features; Our end-to-end OOD detection based on LLM adds task descriptions to prompts, and LLM directly outputs detection results.
  • Figure 3: The demonstration of the two prompts we use to assist ChatGPT in performing OOD intent detection. FSD-OOD incorporates examples of intentions in the prompt as prior knowledge.
  • Figure 4:
  • Figure 5:
  • ...and 5 more figures