Table of Contents
Fetching ...

Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Zepeng Bao, Shen Zhou, Qiankun Pi, Jianhao Chen, Mayi Xu, Ming Zhong, Yuanyuan Zhu, Tieyun Qian

TL;DR

LEAP introduces a dynamic, teacher–student framework for hallucination detection that learns adaptive verification strategies and proactively corrects them before execution. By coupling a planner/actor/critic/reflector loop with trajectory distillation via LoRA, LEAP transfers dynamic planning capabilities into efficient small models, enabling robust tool-grounded verification with open-source backbones. Empirical results across three challenging datasets show LEAP consistently outperforms fixed-strategy baselines and even surpasses teacher-derived trajectories in some cases, highlighting the value of dynamic strategy learning and proactive correction. The work offers a practical path to safer, cost-effective hallucination detection suitable for deployment in real-world LLM systems.

Abstract

Hallucination in large language models (LLMs) remains a critical barrier to their safe deployment. Existing tool-augmented hallucination detection methods require pre-defined fixed verification strategies, which are crucial to the quality and effectiveness of tool calls. Some methods directly employ powerful closed-source LLMs such as GPT-4 as detectors, which are effective but too costly. To mitigate the cost issue, some methods adopt the teacher-student architecture and finetune open-source small models as detectors via agent tuning. However, these methods are limited by fixed strategies. When faced with a dynamically changing execution environment, they may lack adaptability and inappropriately call tools, ultimately leading to detection failure. To address the problem of insufficient strategy adaptability, we propose the innovative ``Learning to Evaluate and Adaptively Plan''(LEAP) framework, which endows an efficient student model with the dynamic learning and proactive correction capabilities of the teacher model. Specifically, our method formulates the hallucination detection problem as a dynamic strategy learning problem. We first employ a teacher model to generate trajectories within the dynamic learning loop and dynamically adjust the strategy based on execution failures. We then distill this dynamic planning capability into an efficient student model via agent tuning. Finally, during strategy execution, the student model adopts a proactive correction mechanism, enabling it to propose, review, and optimize its own verification strategies before execution. We demonstrate through experiments on three challenging benchmarks that our LEAP-tuned model outperforms existing state-of-the-art methods.

Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

TL;DR

LEAP introduces a dynamic, teacher–student framework for hallucination detection that learns adaptive verification strategies and proactively corrects them before execution. By coupling a planner/actor/critic/reflector loop with trajectory distillation via LoRA, LEAP transfers dynamic planning capabilities into efficient small models, enabling robust tool-grounded verification with open-source backbones. Empirical results across three challenging datasets show LEAP consistently outperforms fixed-strategy baselines and even surpasses teacher-derived trajectories in some cases, highlighting the value of dynamic strategy learning and proactive correction. The work offers a practical path to safer, cost-effective hallucination detection suitable for deployment in real-world LLM systems.

Abstract

Hallucination in large language models (LLMs) remains a critical barrier to their safe deployment. Existing tool-augmented hallucination detection methods require pre-defined fixed verification strategies, which are crucial to the quality and effectiveness of tool calls. Some methods directly employ powerful closed-source LLMs such as GPT-4 as detectors, which are effective but too costly. To mitigate the cost issue, some methods adopt the teacher-student architecture and finetune open-source small models as detectors via agent tuning. However, these methods are limited by fixed strategies. When faced with a dynamically changing execution environment, they may lack adaptability and inappropriately call tools, ultimately leading to detection failure. To address the problem of insufficient strategy adaptability, we propose the innovative ``Learning to Evaluate and Adaptively Plan''(LEAP) framework, which endows an efficient student model with the dynamic learning and proactive correction capabilities of the teacher model. Specifically, our method formulates the hallucination detection problem as a dynamic strategy learning problem. We first employ a teacher model to generate trajectories within the dynamic learning loop and dynamically adjust the strategy based on execution failures. We then distill this dynamic planning capability into an efficient student model via agent tuning. Finally, during strategy execution, the student model adopts a proactive correction mechanism, enabling it to propose, review, and optimize its own verification strategies before execution. We demonstrate through experiments on three challenging benchmarks that our LEAP-tuned model outperforms existing state-of-the-art methods.

Paper Structure

This paper contains 31 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Fixed strategies and dynamic strategies for hallucination detection on diverse claims.
  • Figure 2: The LEAP framework with its (a) workflow and (b) core components, including three main steps: 1. Dynamic Strategy Learning and Trajectory Generation Using Teacher Model: A teacher model uses the dynamic learning loop to learn from failure and generate trajectories. 2. Capability Distillation via Agent Tuning: The trajectories are distilled into an efficient student model. 3. Detection with Proactive Correction Using Student Model: The student model adaptively refines its plan before execution to ensure appropriate strategies.
  • Figure 3: A case study on a complex reasoning task. HaluAgent uses its fixed strategy and LEAP uses its dynamic planning, including strategy correction and precise execution.
  • Figure 4: Performance of the Qwen2.5-7B on HaluEval as a function of strategy set size.