Table of Contents
Fetching ...

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Jiatao Li, Xinyu Hu, Xunjian Yin, Xiaojun Wan

TL;DR

The paper addresses how self-generated documents (Self-Docs), produced entirely from an LLM’s internal memory, affect retrieval-augmented generation (RAG) performance across knowledge-intensive QA tasks. It first assesses baseline utility (RQ1), then builds a taxonomy of Self-Docs using Systemic Functional Linguistics (RQ2), and finally explores integrating Self-Docs with external sources (RQ3) such as Wikipedia, including direct mixing and style-transformation approaches. Key contributions include validating the utility of Self-Docs, proposing an SFL-based eight-type taxonomy, and providing actionable guidelines for task-aligned Self-Doc design and external-content harmonization, with evidence that styled integration often yields robust improvements. The study demonstrates that model scale, Self-Doc attributes (tone, granularity, structure), and careful external integration jointly maximize RAG performance on open-domain QA, multi-hop reasoning, fact verification, and long-form answers. These findings offer practical guidance for building knowledge-intensive QA systems that leverage Self-Docs alongside retrieved content while cautioning about task-specific variability and ethical considerations around factual accuracy.

Abstract

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

TL;DR

The paper addresses how self-generated documents (Self-Docs), produced entirely from an LLM’s internal memory, affect retrieval-augmented generation (RAG) performance across knowledge-intensive QA tasks. It first assesses baseline utility (RQ1), then builds a taxonomy of Self-Docs using Systemic Functional Linguistics (RQ2), and finally explores integrating Self-Docs with external sources (RQ3) such as Wikipedia, including direct mixing and style-transformation approaches. Key contributions include validating the utility of Self-Docs, proposing an SFL-based eight-type taxonomy, and providing actionable guidelines for task-aligned Self-Doc design and external-content harmonization, with evidence that styled integration often yields robust improvements. The study demonstrates that model scale, Self-Doc attributes (tone, granularity, structure), and careful external integration jointly maximize RAG performance on open-domain QA, multi-hop reasoning, fact verification, and long-form answers. These findings offer practical guidance for building knowledge-intensive QA systems that leverage Self-Docs alongside retrieved content while cautioning about task-specific variability and ethical considerations around factual accuracy.

Abstract

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

Paper Structure

This paper contains 55 sections, 5 equations, 2 figures, 25 tables, 3 algorithms.

Figures (2)

  • Figure 1: An illustrative overview of our three-stage research framework. RQ1 assesses the baseline utility of Self-Docs in RAG, RQ2 identifies which Self-Doc types are most effective for different tasks, and RQ3 demonstrates how integrating these optimal Self-Docs with external knowledge sources—especially through stylistic alignment—further enhances performance.
  • Figure 2: Performance factor analysis examining the effects of base model size, document count, and document generation model size on RAG performance across different tasks. Left: Impact of varying the base model size when using the same Qwen2.5 model for both document generation and question answering, with the number of self-generated documents fixed at 10. Middle: Impact of varying the number of self-generated documents, using the Qwen2.5-32B-Instruct model as the base model. Right: Impact of varying the document generation model size while keeping the question answering model fixed at Qwen2.5-32B-Instruct and the document count at (n=10).