Table of Contents
Fetching ...

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

Jensen Zhang, Ningyuan Liu, Yijia Fan, Zihao Huang, Qinglin Zeng, Kaitong Cai, Jian Wang, Keze Wang

TL;DR

This work addresses the reliability gap caused by hallucinations in large language models by introducing LLM-CAS, a framework that learns to apply temporary, context-specific neuron perturbations during inference. Framed as a hierarchical reinforcement learning problem, it combines adaptive masking with neuron-level causal tracing and PPO-based learning to selectively modulate activations without permanent parameter edits. Empirical results show consistent improvements across multiple-choice and open-ended tasks, outperforming static edits and prior dynamic methods, and demonstrating robustness across model architectures. The approach promises more trustworthy LLMs with reduced risk of catastrophic forgetting and potential for future multimodal extensions.

Abstract

Large language models (LLMs) often generate hallucinated content that lacks factual or contextual grounding, limiting their reliability in critical applications. Existing approaches such as supervised fine-tuning and reinforcement learning from human feedback are data intensive and computationally expensive, while static parameter editing methods struggle with context dependent errors and catastrophic forgetting. We propose LLM-CAS, a framework that formulates real-time hallucination correction as a hierarchical reinforcement learning problem. LLM-CAS trains an agent to learn a policy that dynamically selects temporary neuron perturbations during inference based on the current context. Unlike prior dynamic approaches that rely on heuristic or predefined adjustments, this policy driven mechanism enables adaptive and fine grained correction without permanent parameter modification. Experiments across multiple language models demonstrate that LLM-CAS consistently improves factual accuracy, achieving gains of 10.98 percentage points on StoryCloze, 2.71 points on TriviaQA, and 2.06 points on the MC1 score of TruthfulQA. These results outperform both static editing methods such as ITI and CAA and the dynamic SADI framework. Overall, LLM-CAS provides an efficient and context aware solution for improving the reliability of LLMs, with promising potential for future multimodal extensions.

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

TL;DR

This work addresses the reliability gap caused by hallucinations in large language models by introducing LLM-CAS, a framework that learns to apply temporary, context-specific neuron perturbations during inference. Framed as a hierarchical reinforcement learning problem, it combines adaptive masking with neuron-level causal tracing and PPO-based learning to selectively modulate activations without permanent parameter edits. Empirical results show consistent improvements across multiple-choice and open-ended tasks, outperforming static edits and prior dynamic methods, and demonstrating robustness across model architectures. The approach promises more trustworthy LLMs with reduced risk of catastrophic forgetting and potential for future multimodal extensions.

Abstract

Large language models (LLMs) often generate hallucinated content that lacks factual or contextual grounding, limiting their reliability in critical applications. Existing approaches such as supervised fine-tuning and reinforcement learning from human feedback are data intensive and computationally expensive, while static parameter editing methods struggle with context dependent errors and catastrophic forgetting. We propose LLM-CAS, a framework that formulates real-time hallucination correction as a hierarchical reinforcement learning problem. LLM-CAS trains an agent to learn a policy that dynamically selects temporary neuron perturbations during inference based on the current context. Unlike prior dynamic approaches that rely on heuristic or predefined adjustments, this policy driven mechanism enables adaptive and fine grained correction without permanent parameter modification. Experiments across multiple language models demonstrate that LLM-CAS consistently improves factual accuracy, achieving gains of 10.98 percentage points on StoryCloze, 2.71 points on TriviaQA, and 2.06 points on the MC1 score of TruthfulQA. These results outperform both static editing methods such as ITI and CAA and the dynamic SADI framework. Overall, LLM-CAS provides an efficient and context aware solution for improving the reliability of LLMs, with promising potential for future multimodal extensions.

Paper Structure

This paper contains 20 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Stage 1: A Story Cloze example mostafazadeh-etal-2016-corpus, i.e., prefix “Rick grew up in a troubled household…”, question, and endings A: “He is happy now.” and B: “He joined a gang.” (correct), fed to the target LLM.
  • Figure 2: Stage 2: Training "bad cases from Stage 1 undergo neuron‐level causal tracing to generate perturbation masks on the LLM’s representations; the perturbed inputs are re‐evaluated to produce a reward for optimizing two PPO agents.
  • Figure 3: Impact of perturbing different numbers of selected neurons on the outcomes. The Adaptive Mask dynamically adjusts both the number and positions of neurons to identify the optimal perturbation strategy.
  • Figure 4: Comparison of PPO decision time, dynamic mask inference time, and overall model execution time. For multiple-choice tasks, "forward pass time" denotes the duration of the model’s forward propagation to obtain logits; for open-ended tasks, "generate time" denotes the duration of calling AutoModelForCausalLM.generatewolf-etal-2020-transformers for text generation without sampling.
  • Figure 5: Accuracy across six TruthfulQA hallucination categories. llm-CAS delivers substantial gains in Cultural accuracy, while improvements in Factual are modest.