Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach
Shenglai Zeng, Pengfei He, Kai Guo, Tianqi Zheng, Hanqing Lu, Yue Xing, Hui Liu
TL;DR
This work tackles the problem of context-robustness in retrieval-augmented LLMs, where external evidence can mislead or overwhelm internal knowledge. It proposes Grft, a lightweight gated representation fine-tuning method combining a gate function with low-rank representation adapters, trained on a small, labeled dataset to regulate when and how external context should influence outputs. By freezing the base model and updating only a tiny fraction of parameters ($0.0004\%$), Grft achieves substantial robustness gains against contradictory and unhelpful contexts while preserving performance on aligned or unknown-but-helpful contexts; Grft-requery further boosts reliability by re-querying the model when signals indicate inconsistency. Experiments on Llama-2-7B-Chat and Llama-3-8B-Instruct across the ConflictQA benchmark and generalization datasets demonstrate strong improvements with minimal overhead, suggesting practical applicability for real-world deployments where imperfect evidence is common.
Abstract
Large Language Models (LLMs) enhanced with external contexts, such as through retrieval-augmented generation (RAG), often face challenges in handling imperfect evidence. They tend to over-rely on external knowledge, making them vulnerable to misleading and unhelpful contexts. To address this, we propose the concept of context-robust LLMs, which can effectively balance internal knowledge with external context, similar to human cognitive processes. Specifically, context-robust LLMs should rely on external context only when lacking internal knowledge, identify contradictions between internal and external knowledge, and disregard unhelpful contexts. To achieve this goal, we introduce Grft, a lightweight and plug-and-play gated representation fine-tuning approach. Grft consists of two key components: a gating mechanism to detect and filter problematic inputs, and low-rank representation adapters to adjust hidden representations. By training a lightweight intervention function with only 0.0004\% of model size on fewer than 200 examples, Grft can effectively adapt LLMs towards context-robust behaviors.
