Table of Contents
Fetching ...

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

TL;DR

OmniThink addresses the problem of shallow, redundant long-form generation by introducing a slow-thinking framework that expands knowledge boundaries through an Information Tree and a Conceptual Pool. The method alternates expansion and reflection to broaden both information and cognition, enabling richer content and higher knowledge density (KD) without sacrificing coherence or depth. A new KD metric is defined, and evaluation on the WildSeek dataset shows OmniThink outperforms strong baselines across relevance, breadth, depth, novelty, and outline quality, with human evaluations corroborating gains in knowledge richness. The work emphasizes boundary analysis to understand where improvements come from and discusses future directions, including multimodal information and personalized language styles to further enhance long-form generation.

Abstract

Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they slowly deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles. Code is available at https://github.com/zjunlp/OmniThink.

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

TL;DR

OmniThink addresses the problem of shallow, redundant long-form generation by introducing a slow-thinking framework that expands knowledge boundaries through an Information Tree and a Conceptual Pool. The method alternates expansion and reflection to broaden both information and cognition, enabling richer content and higher knowledge density (KD) without sacrificing coherence or depth. A new KD metric is defined, and evaluation on the WildSeek dataset shows OmniThink outperforms strong baselines across relevance, breadth, depth, novelty, and outline quality, with human evaluations corroborating gains in knowledge richness. The work emphasizes boundary analysis to understand where improvements come from and discusses future directions, including multimodal information and personalized language styles to further enhance long-form generation.

Abstract

Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they slowly deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles. Code is available at https://github.com/zjunlp/OmniThink.
Paper Structure (59 sections, 3 equations, 17 figures, 7 tables, 2 algorithms)

This paper contains 59 sections, 3 equations, 17 figures, 7 tables, 2 algorithms.

Figures (17)

  • Figure 1: Previous machine writing approaches only expand new information or perspective via RAG and role-playing. OmniThink expands knowledge boundaries through continuous reflection and exploration, attaching knowledge to an information tree and extracting it into a conceptual pool to deepen understanding and uncover more in-depth content.
  • Figure 2: A case generated by STORM using GPT-4o on the topic of AlphaFold. We have marked the repeated expressions in the article regarding "AlphaFold is developed by DeepMind".
  • Figure 3: The overview of OmniThink. As shown in the left diagram, OmniThink is mainly divided into three steps: (a) Information Acquisition, (b) Outline Structuring, and (c) Article Composition. The right diagram illustrates the specific operations during the Information Acquisition step. (① - ②) denotes the initialization of Information Acquisition, (② - ③) corresponds to the reflection, and (③ - ④ ) indicates the expansion.
  • Figure 4: The information scope of OmniThink, Co-STORM, STORM and oRAG.
  • Figure 5: The Comparison of results between OmniThink, oRAG, and oRAG-plus.
  • ...and 12 more figures