Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Takyoung Kim; Kyungjae Lee; Young Rok Jang; Ji Yong Cho; Gangwoo Kim; Minseok Cho; Moontae Lee

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

TL;DR

This paper tackles the challenge of producing tailored long-form responses under coverage constraints by introducing coverage-conditioned ($C^2$) queries. It proposes QTree, a dataset of 10K hierarchical subquery sets (39 subqueries each) to bound the space of potential outlines, and QPlanner, a 7B model trained to generate outlines within QTree boundaries. Through automatic and human evaluations, including Direct Preference Optimization (DPO) alignment, the approach improves both the selection of outlines and downstream RAG performance across diverse domains. The results demonstrate that alignment-informed outlines enhance retrieval relevance and content quality, suggesting a practical path toward more controllable and user-aligned long-form LLM outputs in RAG systems.

Abstract

Interactions with large language models (LLMs) often yield long and detailed responses, leveraging both parametric knowledge and retrieval-augmented generation (RAG). While these responses can provide rich insights, they often include redundant or less engaging content not aligned with user interests. This issue becomes apparent when users specify particular subtopics to include or exclude -- termed coverage-conditioned ($C^2$) queries -- as LLMs often struggle to provide tailored responses. To address this challenge, we investigate the role of query outlines, sequences of subqueries designed to guide LLMs in generating responses that meet specific user requirements. To systematically create and evaluate these outlines, we introduce QTree, a dataset of 10K hierarchical sets of information-seeking subqueries that define structured boundaries for outline creation and evaluation in $C^2$ scenarios. Additionally, we develop QPlanner, a 7B language model trained to generate customized outlines within boundaries of QTree. We evaluate the effectiveness of the generated outlines through automatic and human judgements, focusing on their impact within retrieval-augmented generation (RAG) systems. Experimental results demonstrate that QPlanner, especially when trained with alignment techniques like DPO, generates higher-quality outlines that better fulfill diverse user needs.

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

TL;DR

This paper tackles the challenge of producing tailored long-form responses under coverage constraints by introducing coverage-conditioned (

) queries. It proposes QTree, a dataset of 10K hierarchical subquery sets (39 subqueries each) to bound the space of potential outlines, and QPlanner, a 7B model trained to generate outlines within QTree boundaries. Through automatic and human evaluations, including Direct Preference Optimization (DPO) alignment, the approach improves both the selection of outlines and downstream RAG performance across diverse domains. The results demonstrate that alignment-informed outlines enhance retrieval relevance and content quality, suggesting a practical path toward more controllable and user-aligned long-form LLM outputs in RAG systems.

Abstract

) queries -- as LLMs often struggle to provide tailored responses. To address this challenge, we investigate the role of query outlines, sequences of subqueries designed to guide LLMs in generating responses that meet specific user requirements. To systematically create and evaluate these outlines, we introduce QTree, a dataset of 10K hierarchical sets of information-seeking subqueries that define structured boundaries for outline creation and evaluation in

scenarios. Additionally, we develop QPlanner, a 7B language model trained to generate customized outlines within boundaries of QTree. We evaluate the effectiveness of the generated outlines through automatic and human judgements, focusing on their impact within retrieval-augmented generation (RAG) systems. Experimental results demonstrate that QPlanner, especially when trained with alignment techniques like DPO, generates higher-quality outlines that better fulfill diverse user needs.

Paper Structure (42 sections, 12 figures, 10 tables)

This paper contains 42 sections, 12 figures, 10 tables.

Introduction
Related Work
Query Modification with LLMs
Evaluation of Long-form Responses
Framework
Background
Overview
Preparing C$^2$ Queries (Step 1)
Base Query ($q_{base}$) Collection
QTree Construction
Quality Check
Coverage Query ($q_{cov}$) Generation
Exploring Candidate Outlines & Evaluation (Step 2)
Parsing Outlines
Quality Check
...and 27 more sections

Figures (12)

Figure 1: QTree constrains the range of available outlines for the user's C$^2$ query, and tailored outlines satisfying the requirement of C$^2$ query are selected for RAG downstream tasks.
Figure 2: The overview of our framework. [Step 1] Base query ($q_{base}$) is decomposed into subqueries with diverse viewpoints (QTree), preceded by generating coverage query ($q_{cov}$). [Step 2] After C$^2$ candidate outlines are extracted, a judge LLM evaluates each outline and selects the best-scored one. [Step 3] Utilizing this dataset, QPlanner is trained to sequentially generate its own QTree and preferred outline by taking the C$^2$ query as an input.
Figure 3: Pairwise comparison for each C$^2$ query in automatic outline evaluation.
Figure 4: Human evaluation results.
Figure 5: Initial information provided to participants in our human study.
...and 7 more figures

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

TL;DR

Abstract

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)