OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang
TL;DR
OmniThink addresses the problem of shallow, redundant long-form generation by introducing a slow-thinking framework that expands knowledge boundaries through an Information Tree and a Conceptual Pool. The method alternates expansion and reflection to broaden both information and cognition, enabling richer content and higher knowledge density (KD) without sacrificing coherence or depth. A new KD metric is defined, and evaluation on the WildSeek dataset shows OmniThink outperforms strong baselines across relevance, breadth, depth, novelty, and outline quality, with human evaluations corroborating gains in knowledge richness. The work emphasizes boundary analysis to understand where improvements come from and discusses future directions, including multimodal information and personalized language styles to further enhance long-form generation.
Abstract
Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they slowly deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles. Code is available at https://github.com/zjunlp/OmniThink.
