Table of Contents
Fetching ...

Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model

Jaemin Kim, Jae O Lee, Sumyeong Ahn, Seo Yeon Park

Abstract

Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts. Existing approaches to enhance robustness typically operate via coarse-grained parameter updates at the layer or module level, often overlooking the inherent neuron-level sparsity of Large Language Models (LLMs). To address this limitation, we propose Neuro-RIT (Neuron-guided Robust Instruction Tuning), a novel framework that shifts the paradigm from dense adaptation to precision-driven neuron alignment. Our method explicitly disentangles neurons that are responsible for processing relevant versus irrelevant contexts using attribution-based neuron mining. Subsequently, we introduce a two-stage instruction tuning strategy that enforces a dual capability for noise robustness: achieving direct noise suppression by functionally deactivating neurons exclusive to irrelevant contexts, while simultaneously optimizing targeted layers for evidence distillation. Extensive experiments across diverse QA benchmarks demonstrate that Neuro-RIT consistently outperforms strong baselines and robustness-enhancing methods.

Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model

Abstract

Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts. Existing approaches to enhance robustness typically operate via coarse-grained parameter updates at the layer or module level, often overlooking the inherent neuron-level sparsity of Large Language Models (LLMs). To address this limitation, we propose Neuro-RIT (Neuron-guided Robust Instruction Tuning), a novel framework that shifts the paradigm from dense adaptation to precision-driven neuron alignment. Our method explicitly disentangles neurons that are responsible for processing relevant versus irrelevant contexts using attribution-based neuron mining. Subsequently, we introduce a two-stage instruction tuning strategy that enforces a dual capability for noise robustness: achieving direct noise suppression by functionally deactivating neurons exclusive to irrelevant contexts, while simultaneously optimizing targeted layers for evidence distillation. Extensive experiments across diverse QA benchmarks demonstrate that Neuro-RIT consistently outperforms strong baselines and robustness-enhancing methods.

Paper Structure

This paper contains 40 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The overview of the Neuro-RIT framework. Phase 1: We identify neurons highly responsive to relevant and irrelevant contexts via attribution scores aggregated across samples, and decouple them into distinct sets: relevant ($\mathcal{P}_{\text{rel}}$), irrelevant ($\mathcal{P}_{\text{irrel}}$), and shared ($\mathcal{P}_{\text{shared}}$) neurons. Phase 2: We proceed with a two-stage instruction tuning strategy. First, we instruction-tune $\mathcal{P}_{\text{irrel}}$ to encourage early emission of an End-of-Text (EOT) token, suppressing responses driven by irrelevant contexts. Second, we apply neuron-guided tuning with group-specific gradient masks, while performing full fine-tuning on the top-3 layers with the highest neuron density of $\mathcal{P}_{\text{shared}}$ and $\mathcal{P}_{\text{irrel}}$.
  • Figure 2: Performance comparison of four RAG-based methods on four QA benchmarks, evaluated using an LLM-based metric.
  • Figure 3: Neuron distribution of LLaMA-3-8B-Instruct on the HotpotQA dataset; the left panel shows the distribution of $\mathcal{P}_{\text{rel}}$, $\mathcal{P}_{\text{shared}}$ and $\mathcal{P}_{\text{irrel}}$, while the right panel shows the $\mathcal{P}_{\text{rel}}$ against the combination of $\mathcal{P}_{\text{irrel}}$ and $\mathcal{P}_{\text{shared}}$.
  • Figure 4: Comparison of accuracy on Mistral-7B-Instruct-v0.2 across all datasets. Best results are shown in bold and second-best results are underlined.
  • Figure 5: Neuron distribution of Mistral-7B-Instruct-v0.2 on the HotpotQA dataset showing the distribution of $\mathcal{P}_{\text{rel}}$, $\mathcal{P}_{\text{shared}}$, and $\mathcal{P}_{\text{irrel}}$.