Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Cem Uluoglakci; Tugba Taskaya Temizel

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Cem Uluoglakci, Tugba Taskaya Temizel

Abstract

Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textit{HypoTermInstruct}$, an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release $\textit{HypoTermQA-Enhanced}$, a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across $\textit{Llama3.1-8B}$ and $\textit{Gemma3-4B}$ (base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with $\textit{HypoTermInstruct}$ significantly improves the HypoTerm Score (median increases of 0.19% to 25.91%) and FactScore (+0.39% to +0.86%), while maintaining stable performance on MMLU (minimal decreases of 0.26% to 0.35%). Our work demonstrates that targeted, high-quality SFT data teaching meta-cognitive skills can effectively reduce hallucination without preference/RL pipelines, providing mechanistic insights and a practical path toward more reliable AI systems.

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Abstract

Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce

, an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release

, a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across

and

(base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with

significantly improves the HypoTerm Score (median increases of 0.19% to 25.91%) and FactScore (+0.39% to +0.86%), while maintaining stable performance on MMLU (minimal decreases of 0.26% to 0.35%). Our work demonstrates that targeted, high-quality SFT data teaching meta-cognitive skills can effectively reduce hallucination without preference/RL pipelines, providing mechanistic insights and a practical path toward more reliable AI systems.

Paper Structure (35 sections, 8 figures, 14 tables)

This paper contains 35 sections, 8 figures, 14 tables.

Introduction
Benchmarking Hallucination Tendency
Reducing Hallucination Tendency
HypoTermInstruct Dataset Creation:
Architectural Scope:
Training the Models:
Evaluating the Models:
Validation of the Evaluator:
Experiments
Quantitative Results
Mechanistic Analysis of Internal Behaviors
Visualizing Internal Belief States
Emergence of Distinct Epistemic Uncertainty
Sharpened Boundaries:
Orthogonality and Disentanglement:
...and 20 more sections

Figures (8)

Figure 1: Performance on HypoTermQA-Enhanced
Figure 2: HypoTerm Score: Off-the-Shelf LLMs vs SFT with HypoTermInstruct
Figure 3: Internal Representation of Uncertainty.
Figure 4: Mean Cosine Similarity by Module.
Figure 5: Hypothetical Question Generation Sample (HypoTermQA-Enhanced)
...and 3 more figures

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Abstract

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Authors

Abstract

Table of Contents

Figures (8)