PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

Tianyi Huang; Caden Yang; Emily Yin; Eric Wang; Michael Zhang

PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

Tianyi Huang, Caden Yang, Emily Yin, Eric Wang, Michael Zhang

Abstract

Retrieval-augmented language models can retrieve relevant evidence yet still commit to answers before explicitly checking whether the retrieved context supports the conclusion. We present PAVE (Premise-Grounded Answer Validation and Editing), an inference-time validation layer for evidence-grounded question answering. PAVE decomposes retrieved context into question-conditioned atomic facts, drafts an answer, scores how well that draft is supported by the extracted premises, and revises low-support outputs before finalization. The resulting trace makes answer commitment auditable at the level of explicit premises, support scores, and revision decisions. In controlled ablations with a fixed retriever and backbone, PAVE outperforms simpler post-retrieval baselines in two evidence-grounded QA settings, with the largest gain reaching 32.7 accuracy points on a span-grounded benchmark. We view these findings as proof-of-concept evidence that explicit premise extraction plus support-gated revision can strengthen evidence-grounded consistency in retrieval-augmented LLM systems.

PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

Abstract

Paper Structure (21 sections, 6 equations, 3 figures, 1 table)

This paper contains 21 sections, 6 equations, 3 figures, 1 table.

Introduction
Contributions.
Related Work
Verification and hallucination mitigation
Fine-grained retrieval and RAG optimization
Logical reasoning and solver-augmented methods
Method
Problem setup
Atomic decomposition as premise extraction
Support scoring and revision
What the trace adds beyond direct LLM interaction
Experimental Setup
Datasets
PubMedQA.
SQuAD.
...and 6 more sections

Figures (3)

Figure 1: High-level view of PAVE. Instead of committing immediately to a final answer, the system inserts an explicit validation stage between retrieval and answer finalization: retrieved context is decomposed into atomic facts, used to draft an answer and short rationale, scored for evidence support, and revised only when the support signal is insufficient.
Figure 2: Illustrative execution trace for PAVE on a traffic-rule question. The retrieved context is decomposed into atomic premises, used to produce a draft answer and rationale, scored for support, and then finalized. In this example, the decomposition makes the governing rule, the relevant condition, and the exception explicit before final answer commitment.
Figure 3: Paired outcome transition on 100 shared SQuAD examples comparing the support-scoring variant with PAVE.

PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

Abstract

PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

Authors

Abstract

Table of Contents

Figures (3)