Table of Contents
Fetching ...

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

Shuzheng Si, Haozhe Zhao, Gang Chen, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Kaikai An, Kangyang Luo, Chen Qian, Fanchao Qi, Baobao Chang, Maosong Sun

TL;DR

NOVA tackles the persistent hallucination problem in instruction-tuned LLMs by filtering training data to avoid unfamiliar knowledge. It introduces Internal Consistency Probing (ICP) to gauge instruction familiarity via internal-state embeddings and differential entropy, and Semantic Equivalence Identification (SEI) to assess target-response familiarity through NLI-based semantic clustering. A quality reward model is used alongside a familiarity score to rank data, and top samples are used for supervised fine-tuning. Across multiple benchmarks, NOVA reduces hallucinations while preserving instruction-following ability and scales to larger models, offering a practical data-filtering alternative to RL-based approaches.

Abstract

Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, to ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less.

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

TL;DR

NOVA tackles the persistent hallucination problem in instruction-tuned LLMs by filtering training data to avoid unfamiliar knowledge. It introduces Internal Consistency Probing (ICP) to gauge instruction familiarity via internal-state embeddings and differential entropy, and Semantic Equivalence Identification (SEI) to assess target-response familiarity through NLI-based semantic clustering. A quality reward model is used alongside a familiarity score to rank data, and top samples are used for supervised fine-tuning. Across multiple benchmarks, NOVA reduces hallucinations while preserving instruction-following ability and scales to larger models, offering a practical data-filtering alternative to RL-based approaches.

Abstract

Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, to ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less.

Paper Structure

This paper contains 26 sections, 10 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Instruction following ability on MT-Bench vs hallucination on LongFact. NOVA simultaneously aligns LLMs to follow instructions and hallucinate less.
  • Figure 2: The process of NOVA. NOVA identifies and selects high-quality instruction data that aligns well with the LLM’s learned knowledge to reduce hallucination. Then it uses selected instruction data for training LLMs.
  • Figure 3: Average perplexity score of 15 samples with the lowest scores for each model from LongFact-Objects. Models are trained on Alpaca-GPT4.
  • Figure 4: Human evaluation across four key dimensions. The models are trained on Alpaca-GPT4.
  • Figure 5: FactScore results on BioGEN with the different number of generated responses $K$. We conduct the experiments based on LLaMA-3-8B.
  • ...and 3 more figures