Table of Contents
Fetching ...

GINGER: Grounded Information Nugget-Based Generation of Responses

Weronika Łajewska, Krisztian Balog

TL;DR

Grounded generation in retrieval-augmented systems remains prone to factual errors and attribution gaps. The authors propose GINGER, a modular nugget-based pipeline that detects information nuggets in retrieved passages, clusters them by query facet with BERTopic, ranks clusters via duoT5, summarizes top clusters into concise, source-attributed statements, and refines fluency with an LLM. Grounding is enforced by ensuring every final statement is entailed by the source nuggets, and evaluation on the TREC RAG'24 augmented generation task shows competitive performance, surpassing two strong baselines and nearing top submissions as more passages are provided, as measured by $V_{strict}$. An ablation study attributes gains to the nugget-centric representation rather than any single component, underscoring the importance of information granularity and redundancy reduction. The work contributes a practical, verifiable framework for constraint-based grounded response generation with publicly available resources.

Abstract

Retrieval-augmented generation (RAG) faces challenges related to factual correctness, source attribution, and response completeness. To address them, we propose a modular pipeline for grounded response generation that operates on information nuggets-minimal, atomic units of relevant information extracted from retrieved documents. The multistage pipeline encompasses nugget detection, clustering, ranking, top cluster summarization, and fluency enhancement. It guarantees grounding in specific facts, facilitates source attribution, and ensures maximum information inclusion within length constraints. Extensive experiments on the TREC RAG'24 dataset evaluated with the AutoNuggetizer framework demonstrate that GINGER achieves state-of-the-art performance on this benchmark.

GINGER: Grounded Information Nugget-Based Generation of Responses

TL;DR

Grounded generation in retrieval-augmented systems remains prone to factual errors and attribution gaps. The authors propose GINGER, a modular nugget-based pipeline that detects information nuggets in retrieved passages, clusters them by query facet with BERTopic, ranks clusters via duoT5, summarizes top clusters into concise, source-attributed statements, and refines fluency with an LLM. Grounding is enforced by ensuring every final statement is entailed by the source nuggets, and evaluation on the TREC RAG'24 augmented generation task shows competitive performance, surpassing two strong baselines and nearing top submissions as more passages are provided, as measured by . An ablation study attributes gains to the nugget-centric representation rather than any single component, underscoring the importance of information granularity and redundancy reduction. The work contributes a practical, verifiable framework for constraint-based grounded response generation with publicly available resources.

Abstract

Retrieval-augmented generation (RAG) faces challenges related to factual correctness, source attribution, and response completeness. To address them, we propose a modular pipeline for grounded response generation that operates on information nuggets-minimal, atomic units of relevant information extracted from retrieved documents. The multistage pipeline encompasses nugget detection, clustering, ranking, top cluster summarization, and fluency enhancement. It guarantees grounding in specific facts, facilitates source attribution, and ensures maximum information inclusion within length constraints. Extensive experiments on the TREC RAG'24 dataset evaluated with the AutoNuggetizer framework demonstrate that GINGER achieves state-of-the-art performance on this benchmark.

Paper Structure

This paper contains 6 sections, 1 equation, 1 figure, 2 tables.

Figures (1)

  • Figure 1: High-level overview of our nugget-based response generation pipeline (GINGER).