Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

Andi Zhang; Tim Z. Xiao; Weiyang Liu; Robert Bamler; Damon Wischik

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

Andi Zhang, Tim Z. Xiao, Weiyang Liu, Robert Bamler, Damon Wischik

TL;DR

This work shows that the likelihood ratio between a pretrained LLM and its finetuned version is an effective, readily computable criterion for out-of-distribution detection across text, spam, and QA contexts, without additional training. By framing pretrained LLMs as OOD proxies, the authors define a simple yet powerful detector $S(x) = \frac{p_\theta(x)}{p_{\theta'}(x)}$ and extend it to QA pairs with multiple criteria, including $S_q$, $S_a$, $S_{q,a}$, and $S_{a|q}$. Comprehensive experiments across far OOD, near OOD, spam detection, and OOD QA demonstrate that LR-based methods, particularly with larger models, consistently outperform baselines and exhibit robustness in diverse settings, including scenarios with no access to in-domain labels. The results highlight the practical impact: easy deployment, no extra training, and broad applicability to robust QA and general OOD detection tasks. The work also discusses limitations (e.g., Nalisnick paradox, domain-specific data quirks) and provides practical guidance and code for practitioners.

Abstract

We revisit the likelihood ratio between a pretrained large language model (LLM) and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to distinguish their difference. Leveraging the power of LLMs, we show that, the likelihood ratio can serve as an effective OOD detection criterion. Moreover, we apply the proposed LLM-based likelihood ratio to detect OOD questions in question-answering (QA) systems, which can be used to improve the performance of specialized LLMs for general questions. Given that likelihood can be easily obtained by the loss functions within contemporary neural network frameworks, it is straightforward to implement this approach in practice. Since both the pretrained LLMs and its various finetuned models are widely available from online platforms such as Hugging Face, our proposed criterion can be effortlessly incorporated for OOD detection without the need for further training. We conduct comprehensive evaluation across on multiple settings, including far OOD, near OOD, spam detection, and QA scenarios, to demonstrate the effectiveness of the method. Code can be found at https://github.com/andiac/LLMOODratio

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

TL;DR

and extend it to QA pairs with multiple criteria, including

, and

. Comprehensive experiments across far OOD, near OOD, spam detection, and OOD QA demonstrate that LR-based methods, particularly with larger models, consistently outperform baselines and exhibit robustness in diverse settings, including scenarios with no access to in-domain labels. The results highlight the practical impact: easy deployment, no extra training, and broad applicability to robust QA and general OOD detection tasks. The work also discusses limitations (e.g., Nalisnick paradox, domain-specific data quirks) and provides practical guidance and code for practitioners.

Abstract

Paper Structure (25 sections, 3 equations, 3 figures, 8 tables)

This paper contains 25 sections, 3 equations, 3 figures, 8 tables.

INTRODUCTION
BACKGROUND AND PRELIMINARIES
OOD Detection
"Supervised" and "Unsupervised" OOD Detection
The Paradox in Unsupervised OOD Detection
OOD Proxy
Likelihood of Autoregressive Language Models
PRETRAINED LLM AS A OOD PROXY
LIKELIHOOD RATIO OOD DETECTION FOR QA SYSTEMS
EXPERIMENTS
Evaluation Metrics
Far OOD Detection
Near OOD Detection
Spam Detection
OOD Question Detection in QA Systems
...and 10 more sections

Figures (3)

Figure 1: Relationship among sentences within a specific domain, the comprehensive set of human language, and all conceivable character permutations.
Figure 2: Example question-answer sets produced by MetaMath-7B. The responses to In-D questions are accurate and logical. However, for OOD questions, MetaMath-7B generates unreasonable answers, responding to a straightforward query with unnecessary mathematical calculations or producing repetitive sentences with no useful information. For the complete image, please see Appendix.
Figure 3: Example question-answer sets produced by MetaMath-7B. The responses to In-D questions are accurate and logical. However, for OOD questions, MetaMath-7B generates unreasonable answers, responding to a straightforward query with unnecessary mathematical calculations or producing repetitive sentences with no useful information.

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

TL;DR

Abstract

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

Authors

TL;DR

Abstract

Table of Contents

Figures (3)