Table of Contents
Fetching ...

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

Junhao Zhang, Richong Zhang, Fanshuang Kong, Ziyang Miao, Yanhan Ye, Yaowei Zheng

TL;DR

This paper introduces LongInOutBench, the first benchmark explicitly designed for long-input, long-output generation, and an evaluation framework that measures length adherence, content consistency across multiple sources, and linguistic quality. It then presents RAL-Writer, a Retrieval-Augmented Long-Text Writer that plans writing steps and augments prompts with restated, important chunks to counteract the 'lost-in-the-middle' effect when processing lengthy inputs. The approach relies on a long-text chunking system, a relevance-and-position based chunk retrieval mechanism, and a restatement strategy to strengthen model attention to critical mid-text content. Experimental results show RAL-Writer improves consistency and quality over baselines on LongInOutBench across several backbones, while highlighting ongoing challenges in achieving very long outputs and in planner reliability; the work provides a methodological foundation for robust long-form knowledge-intensive generation.

Abstract

Existing long-text generation methods primarily concentrate on producing lengthy texts from short inputs, neglecting the long-input and long-output tasks. Such tasks have numerous practical applications while lacking available benchmarks. Moreover, as the input grows in length, existing methods inevitably encounter the "lost-in-the-middle" phenomenon. In this paper, we first introduce a Long Input and Output Benchmark (LongInOutBench), including a synthetic dataset and a comprehensive evaluation framework, addressing the challenge of the missing benchmark. We then develop the Retrieval-Augmented Long-Text Writer (RAL-Writer), which retrieves and restates important yet overlooked content, mitigating the "lost-in-the-middle" issue by constructing explicit prompts. We finally employ the proposed LongInOutBench to evaluate our RAL-Writer against comparable baselines, and the results demonstrate the effectiveness of our approach. Our code has been released at https://github.com/OnlyAR/RAL-Writer.

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

TL;DR

This paper introduces LongInOutBench, the first benchmark explicitly designed for long-input, long-output generation, and an evaluation framework that measures length adherence, content consistency across multiple sources, and linguistic quality. It then presents RAL-Writer, a Retrieval-Augmented Long-Text Writer that plans writing steps and augments prompts with restated, important chunks to counteract the 'lost-in-the-middle' effect when processing lengthy inputs. The approach relies on a long-text chunking system, a relevance-and-position based chunk retrieval mechanism, and a restatement strategy to strengthen model attention to critical mid-text content. Experimental results show RAL-Writer improves consistency and quality over baselines on LongInOutBench across several backbones, while highlighting ongoing challenges in achieving very long outputs and in planner reliability; the work provides a methodological foundation for robust long-form knowledge-intensive generation.

Abstract

Existing long-text generation methods primarily concentrate on producing lengthy texts from short inputs, neglecting the long-input and long-output tasks. Such tasks have numerous practical applications while lacking available benchmarks. Moreover, as the input grows in length, existing methods inevitably encounter the "lost-in-the-middle" phenomenon. In this paper, we first introduce a Long Input and Output Benchmark (LongInOutBench), including a synthetic dataset and a comprehensive evaluation framework, addressing the challenge of the missing benchmark. We then develop the Retrieval-Augmented Long-Text Writer (RAL-Writer), which retrieves and restates important yet overlooked content, mitigating the "lost-in-the-middle" issue by constructing explicit prompts. We finally employ the proposed LongInOutBench to evaluate our RAL-Writer against comparable baselines, and the results demonstrate the effectiveness of our approach. Our code has been released at https://github.com/OnlyAR/RAL-Writer.

Paper Structure

This paper contains 35 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of LongInOutBench.
  • Figure 2: Statistical information of in LongInOutBench. Noted, categories in (a) with proportions less than 3% are grouped into "Others".
  • Figure 3: Illustration of RAL-Writer.
  • Figure 4: (a) A larger $a$ means that the slope of $P$ is greater near both ends, while the middle part is closer to 0. (b) A larger $b$ indicates that $P$ can achieve a greater maximum value at both ends. (c) Importance $I$ is defined as the difference between $P$ and $R$; the stronger the relevance of a chunk to the current step and the closer its position is to the middle, the greater the value of $I$ becomes.
  • Figure 5: With actual data, employing the optimal parameters, a demonstration of chunks recall during the Write phase.
  • ...and 2 more figures