Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
Alexandre Misrahi, Nadezhda Chirkova, Maxime Louis, Vassilina Nikoulina
TL;DR
This work targets multi-domain retrieval-augmented generation (RAG) by (i) constructing a diverse benchmark across 8 sources and 13 domains to test cross-domain performance, and (ii) systematically evaluating RAG adaptation strategies under domain shift. It finds that standard LLM fine-tuning for RAG often fails to generalize across domains, while sequence-level distillation using teacher-generated labels substantially improves out-of-domain performance by fostering more coherent supervision. The authors also show that targeted attention-pattern tweaks via LoRA-QKAtt can enhance robustness, and that RagChecker-based analysis reveals improved faithfulness and reduced hallucination with distilled labels. Overall, the paper highlights practical strategies to bolster multi-domain RAG robustness in the face of domain shift.
Abstract
Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.
