Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

Junhao Zhang; Richong Zhang; Fanshuang Kong; Ziyang Miao; Yanhan Ye; Yaowei Zheng

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

Junhao Zhang, Richong Zhang, Fanshuang Kong, Ziyang Miao, Yanhan Ye, Yaowei Zheng

TL;DR

This paper introduces LongInOutBench, the first benchmark explicitly designed for long-input, long-output generation, and an evaluation framework that measures length adherence, content consistency across multiple sources, and linguistic quality. It then presents RAL-Writer, a Retrieval-Augmented Long-Text Writer that plans writing steps and augments prompts with restated, important chunks to counteract the 'lost-in-the-middle' effect when processing lengthy inputs. The approach relies on a long-text chunking system, a relevance-and-position based chunk retrieval mechanism, and a restatement strategy to strengthen model attention to critical mid-text content. Experimental results show RAL-Writer improves consistency and quality over baselines on LongInOutBench across several backbones, while highlighting ongoing challenges in achieving very long outputs and in planner reliability; the work provides a methodological foundation for robust long-form knowledge-intensive generation.

Abstract

Existing long-text generation methods primarily concentrate on producing lengthy texts from short inputs, neglecting the long-input and long-output tasks. Such tasks have numerous practical applications while lacking available benchmarks. Moreover, as the input grows in length, existing methods inevitably encounter the "lost-in-the-middle" phenomenon. In this paper, we first introduce a Long Input and Output Benchmark (LongInOutBench), including a synthetic dataset and a comprehensive evaluation framework, addressing the challenge of the missing benchmark. We then develop the Retrieval-Augmented Long-Text Writer (RAL-Writer), which retrieves and restates important yet overlooked content, mitigating the "lost-in-the-middle" issue by constructing explicit prompts. We finally employ the proposed LongInOutBench to evaluate our RAL-Writer against comparable baselines, and the results demonstrate the effectiveness of our approach. Our code has been released at https://github.com/OnlyAR/RAL-Writer.

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

TL;DR

Abstract

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)