Table of Contents
Fetching ...

ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Kashob Kumar Roy, Pritom Saha Akash, Kevin Chen-Chuan Chang, Lucian Popa

TL;DR

ConTReGen is introduced, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content and outperforms existing state-of-the-art RAG models.

Abstract

Open-domain long-form text generation requires generating coherent, comprehensive responses that address complex queries with both breadth and depth. This task is challenging due to the need to accurately capture diverse facets of input queries. Existing iterative retrieval-augmented generation (RAG) approaches often struggle to delve deeply into each facet of complex queries and integrate knowledge from various sources effectively. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content. ConTReGen integrates a hierarchical, top-down in-depth exploration of query facets with a systematic bottom-up synthesis, ensuring comprehensive coverage and coherent integration of multifaceted information. Extensive experiments on multiple datasets, including LFQA and ODSUM, alongside a newly introduced dataset, ODSUM-WikiHow, demonstrate that ConTReGen outperforms existing state-of-the-art RAG models.

ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

TL;DR

ConTReGen is introduced, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content and outperforms existing state-of-the-art RAG models.

Abstract

Open-domain long-form text generation requires generating coherent, comprehensive responses that address complex queries with both breadth and depth. This task is challenging due to the need to accurately capture diverse facets of input queries. Existing iterative retrieval-augmented generation (RAG) approaches often struggle to delve deeply into each facet of complex queries and integrate knowledge from various sources effectively. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content. ConTReGen integrates a hierarchical, top-down in-depth exploration of query facets with a systematic bottom-up synthesis, ensuring comprehensive coverage and coherent integration of multifaceted information. Extensive experiments on multiple datasets, including LFQA and ODSUM, alongside a newly introduced dataset, ODSUM-WikiHow, demonstrate that ConTReGen outperforms existing state-of-the-art RAG models.

Paper Structure

This paper contains 31 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Schematic illustration of retrieval reasoning
  • Figure 2: Retrieval Recall Performance. Prev. full response shao2023enhancing, Prev. response segment asai2023self, Next followup question press2022measuringxu2024search, Next query generation khattab2022demonstrate.
  • Figure 3: Retrieval Recall Performance per iteration on ODSUM-Story. Prev. full response shao2023enhancing, Prev. response segment asai2023self, Next followup question press2022measuringxu2024search, Next query generation khattab2022demonstrate
  • Figure 4: ConTReGen Framework.
  • Figure 5: Two-step Verification.