Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
Jun Bai, Minghao Tong, Yang Liu, Zixia Jia, Zilong Zheng
TL;DR
The paper tackles the problem that LLMs often ground outputs poorly in contextual information. It proposes Router Lens to identify context-faithful experts by fine-tuning only the router and computing a Context-dependence Ratio, revealing experts that highly leverage context. Building on this, CEFT fine-tunes only the context-faithful experts, achieving competitive or superior performance with far fewer trainable parameters than full fine-tuning. Extensive experiments across multiple MoE models and context-dependent tasks demonstrate improved context grounding, better robustness against forgetting, and substantial training efficiency. The work advances practical strategies for aspect-specific expert optimization in Mixture-of-Experts LLMs, with implications for reliable reasoning in context-rich settings.
Abstract
Context faithfulness is essential for reliable reasoning in context-dependent scenarios. However, large language models often struggle to ground their outputs in the provided context, resulting in irrelevant responses. Inspired by the emergent expert specialization observed in mixture-of-experts architectures, this work investigates whether certain experts exhibit specialization in context utilization, offering a potential pathway toward targeted optimization for improved context faithfulness. To explore this, we propose Router Lens, a method that accurately identifies context-faithful experts. Our analysis reveals that these experts progressively amplify attention to relevant contextual information, thereby enhancing context grounding. Building on this insight, we introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts. Experiments across a wide range of benchmarks and models demonstrate that CEFT matches or surpasses the performance of full fine-tuning while being significantly more efficient.
