LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources
Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun
TL;DR
The paper addresses long-to-long text generation from extremely long resources, a scenario where standard LLMs struggle with context limits. It introduces LLM×MapReduce-V2, an entropy-driven convolutional test-time scaling framework guided by information bottleneck theory, leveraging skeletons S and digests D to progressively integrate information and generate coherent long-form articles. The approach includes skeleton initialization and refinement, digest-based feedback, entropy-driven sampling, best-of-N self-refinement, and topology-aware content generation, evaluated on the SurveyEval benchmark. Results show substantial improvements over baselines, including at least a 32.9% gain in reference utilization, with strong structural and content-quality metrics and positive human evaluations, making it a promising direction for scalable long-form generation from very long resources.
Abstract
Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing relevant information from extensive inputs, which remains difficult for current large language models (LLMs). In this paper, we propose LLM$\times$MapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of LLMs to process extremely long inputs. Drawing inspiration from convolutional neural networks, which iteratively integrate local features into higher-level global representations, LLM$\times$MapReduce-V2 utilizes stacked convolutional scaling layers to progressively expand the understanding of input materials. Both quantitative and qualitative experimental results demonstrate that our approach substantially enhances the ability of LLMs to process long inputs and generate coherent, informative long-form articles, outperforming several representative baselines. Both LLM$\times$MapReduce-V2 and SurveyEval are publicly available at https://github.com/thunlp/LLMxMapReduce .
