Table of Contents
Fetching ...

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun

TL;DR

The paper addresses long-to-long text generation from extremely long resources, a scenario where standard LLMs struggle with context limits. It introduces LLM×MapReduce-V2, an entropy-driven convolutional test-time scaling framework guided by information bottleneck theory, leveraging skeletons S and digests D to progressively integrate information and generate coherent long-form articles. The approach includes skeleton initialization and refinement, digest-based feedback, entropy-driven sampling, best-of-N self-refinement, and topology-aware content generation, evaluated on the SurveyEval benchmark. Results show substantial improvements over baselines, including at least a 32.9% gain in reference utilization, with strong structural and content-quality metrics and positive human evaluations, making it a promising direction for scalable long-form generation from very long resources.

Abstract

Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing relevant information from extensive inputs, which remains difficult for current large language models (LLMs). In this paper, we propose LLM$\times$MapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of LLMs to process extremely long inputs. Drawing inspiration from convolutional neural networks, which iteratively integrate local features into higher-level global representations, LLM$\times$MapReduce-V2 utilizes stacked convolutional scaling layers to progressively expand the understanding of input materials. Both quantitative and qualitative experimental results demonstrate that our approach substantially enhances the ability of LLMs to process long inputs and generate coherent, informative long-form articles, outperforming several representative baselines. Both LLM$\times$MapReduce-V2 and SurveyEval are publicly available at https://github.com/thunlp/LLMxMapReduce .

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

TL;DR

The paper addresses long-to-long text generation from extremely long resources, a scenario where standard LLMs struggle with context limits. It introduces LLM×MapReduce-V2, an entropy-driven convolutional test-time scaling framework guided by information bottleneck theory, leveraging skeletons S and digests D to progressively integrate information and generate coherent long-form articles. The approach includes skeleton initialization and refinement, digest-based feedback, entropy-driven sampling, best-of-N self-refinement, and topology-aware content generation, evaluated on the SurveyEval benchmark. Results show substantial improvements over baselines, including at least a 32.9% gain in reference utilization, with strong structural and content-quality metrics and positive human evaluations, making it a promising direction for scalable long-form generation from very long resources.

Abstract

Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing relevant information from extensive inputs, which remains difficult for current large language models (LLMs). In this paper, we propose LLMMapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of LLMs to process extremely long inputs. Drawing inspiration from convolutional neural networks, which iteratively integrate local features into higher-level global representations, LLMMapReduce-V2 utilizes stacked convolutional scaling layers to progressively expand the understanding of input materials. Both quantitative and qualitative experimental results demonstrate that our approach substantially enhances the ability of LLMs to process long inputs and generate coherent, informative long-form articles, outperforming several representative baselines. Both LLMMapReduce-V2 and SurveyEval are publicly available at https://github.com/thunlp/LLMxMapReduce .

Paper Structure

This paper contains 51 sections, 34 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Comparison between traditional extractive methods and integrative approach for resource utilization in long-form generation. Extractive methods select relevant content based on queries, which may overlook important information not directly aligned with the query. In contrast, the integrative approach synthesizes a broader range of content, capturing connections for a more comprehensive understanding.
  • Figure 2: Example of the structure in the skeleton.
  • Figure 3: The pipeline of LLM$\times$MapReduce-V2. LLM$\times$MapReduce-V2 can be roughly divided into three stages. In the Initialization phase, LLM$\times$MapReduce-V2 initializes the skeleton based on the vast resources and the given topic, and generates the corresponding structured digests. In the Skeleton Improvement phase, LLM$\times$MapReduce-V2 utilizes the feedback from the digests to refine the skeleton, which is guided by entropy-driven random sampling and multi-layer convolution for feedback aggregation. Additionally, a series of Best-of-N iterations are employed to further enhance the skeleton. In the Survey Construction phase, LLM$\times$MapReduce-V2 regenerates structured digests based on the optimized skeleton and performs topology-aware content generation to produce the final survey.
  • Figure 4: Human-evaluated win rate of LLM$\times$MapReduce-V2 compared to AutoSurvey on the test set.
  • Figure 5: Analysis of the components in LLM$\times$MapReduce-V2. We use the normalized information entropy score as the evaluation metric for the skeleton, which reflects the informativeness of the intermediate results.
  • ...and 6 more figures