Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

Yang Deng; Yong Zhao; Moxin Li; See-Kiong Ng; Tat-Seng Chua

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

Yang Deng, Yong Zhao, Moxin Li, See-Kiong Ng, Tat-Seng Chua

TL;DR

Large language models often confidently answer questions with no definitive answer, risking hallucinations. The authors introduce Self-Aligned, a scalable self-alignment framework that uses class-aware self-augmentation to generate unknown-question data and disparity-driven self-curation to curate high-quality training data, followed by supervised fine-tuning and iterative refinement. The method enables LLMs to detect unknown questions, classify the reason they are unknown, and provide open-ended responses with explanations, outperforming prompt-based and fine-tuning baselines on QNotA and KUQP datasets. While effective with modest seed data and open-source models, the work acknowledges evaluation protocol limitations and restricted applicability to fine-tunable systems, proposing directions for broader validation and larger-model experiments.

Abstract

Despite the remarkable abilities of Large Language Models (LLMs) to answer questions, they often display a considerable level of overconfidence even when the question does not have a definitive answer. To avoid providing hallucinated answers to these unknown questions, existing studies typically investigate approaches to refusing to answer these questions. In this work, we propose a novel and scalable self-alignment method to utilize the LLM itself to enhance its response-ability to different types of unknown questions, being capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. Specifically, the Self-Align method first employ a two-stage class-aware self-augmentation approach to generate a large amount of unknown question-response data. Then we conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired. Experimental results on two datasets across four types of unknown questions validate the superiority of the Self-Align method over existing baselines in terms of three types of task formulation.

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

TL;DR

Abstract

Paper Structure (74 sections, 4 equations, 6 figures, 9 tables)

This paper contains 74 sections, 4 equations, 6 figures, 9 tables.

Introduction
Related Works
Uncertainty in Large Language Models
Unknown Questions
Large Language Model Self-alignment
Method
Initialization
Seed Data
Base Model
Known QA Data
Class-aware Self-Augmentation
Guided Question Rewriting
Conditioned Response Generation
Disparity-driven Self-Curation
Supervised Fine-tuning
...and 59 more sections

Figures (6)

Figure 1: Comparisons of different types of responses to an unknown question that contains incorrect assumption. Red words denote the hallucinated content, while underlined word denotes the explanation.
Figure 2: The workflow of the Self-Aligned method.
Figure 3: Effect of self-curation approaches.
Figure 4: Effect of iterative self-alignment.
Figure 5: Case study. The left one is an ambiguous question, while the right one is an incorrect question. Red words denote the hallucinated content, while green words denote helpful explanations.
...and 1 more figures

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

TL;DR

Abstract

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)