Table of Contents
Fetching ...

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation

Tobias Leemann, Periklis Petridis, Giuseppe Vietri, Dionysis Manousakas, Aaron Roth, Sergul Aydore

TL;DR

This work addresses domain shift in NLI-based grounding verification for retrieval-augmented generation by introducing Auto-GDA, an unsupervised domain adaptation framework that generates synthetic data, applies label-preserving augmentations, and uses weak supervision from a teacher to tailor lightweight NLI models to realistic RAG inputs. Auto-GDA iteratively generates data with a generator $G$, expands diversity via mutations $M$, and refines labels with a teacher $T$, selecting a top-$K$ subset by minimizing an enhanced distribution-matching objective $L_{tot}$ that blends marginal alignment, label correctness $LDiv$, and a model-utility term $U_f$. Experiments on realistic RAG datasets show that fine-tuning with Auto-GDA data often matches or surpasses the teacher and approaches LLM-level performance while offering about an order of magnitude reduction in inference cost compared to large LLMs. The framework provides a practical, controllable path to efficient grounding verification by domain-adapting compact NLI models without relying on large labeled target sets.

Abstract

While retrieval-augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. A common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10% of their computational cost.

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation

TL;DR

This work addresses domain shift in NLI-based grounding verification for retrieval-augmented generation by introducing Auto-GDA, an unsupervised domain adaptation framework that generates synthetic data, applies label-preserving augmentations, and uses weak supervision from a teacher to tailor lightweight NLI models to realistic RAG inputs. Auto-GDA iteratively generates data with a generator , expands diversity via mutations , and refines labels with a teacher , selecting a top- subset by minimizing an enhanced distribution-matching objective that blends marginal alignment, label correctness , and a model-utility term . Experiments on realistic RAG datasets show that fine-tuning with Auto-GDA data often matches or surpasses the teacher and approaches LLM-level performance while offering about an order of magnitude reduction in inference cost compared to large LLMs. The framework provides a practical, controllable path to efficient grounding verification by domain-adapting compact NLI models without relying on large labeled target sets.

Abstract

While retrieval-augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. A common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10% of their computational cost.
Paper Structure (44 sections, 2 theorems, 39 equations, 9 figures, 16 tables, 1 algorithm)

This paper contains 44 sections, 2 theorems, 39 equations, 9 figures, 16 tables, 1 algorithm.

Key Result

Proposition 1

Let $\phi \sim \text{Beta}(\alpha, \beta)$ denote a Beta distribution. Let $\phi_0$ be the parameter of the (certain) initial label distribution (usually corresponding to $\hat{y}$) and let $r$ denote the probability of the mutated sample having label $y=1$ (entailment certainty). In the last statement, $\psi$ denotes the digamma-function and $p_\mathcal{Q}(y|{\bm{c}}) = \text{Bernoulli}(\phi_0)$

Figures (9)

  • Figure 1: Landscape of current grounding verification models for RAG. While LLMs have the best performance, they incur about 10$\times$ higher latency than lightweight models. In this work, we are interested in obtaining lightweight models with LLM-level performance for grounding verification through domain adaptation.
  • Figure 2: Overview of Auto-GDA. We generate initial data using the generator $G$, which are assigned entailment certainty scores using teacher model $T$. The synthetic data is iteratively augmented using $M$, whereas label-preservation is confirmed with $T$ and entailment certainties are updated. We finally select the top-$K$ samples that minimize an objective function $L_{\text{tot}}$. These steps can be applied iteratively until the final data is used to fine-tune the model $f$ for the target domain.
  • Figure 3: Intuition for our update rule for entailment certainties: If a parent claim ${\bm{c}}$ is entailed by ${\bm{e}}$ and a mutated claim ${\bm{c}}°\prime$ is entailed by its parent ${\bm{c}}$, the mutated claim ${\bm{c}}^\prime$ will be entailed by ${\bm{e}}$ as well.
  • Figure 4: Modeling the label correctness term in Eqn. \ref{['eqn: L_tot']} as function of $r$. When the estimated entailment certainty $r$ does not match the assigned hard label $\hat{y}$ this term takes high values, discouraging selection.
  • Figure 5: We model the label uncertainty through a hyper distribution over the parameter $\varphi$.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Definition 1: Probabilistically Correct Data Augmentation, PCDA