Table of Contents
Fetching ...

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

Qiyao Wang, Hongbo Wang, Longze Chen, Zhihao Yang, Guhong Chen, Hamid Alinejad-Rokny, Hui Li, Yuan Lin, Min Yang

Abstract

Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process. FlowPIE expands literature trajectories via a flow-guided Monte Carlo Tree Search (MCTS) inspired by GFlowNets, using the quality of current ideas assessed by an LLM-based generative reward model (GRM) as a supervised signal to guide adaptive retrieval and construct a diverse, high-quality initial population. Based on this population, FlowPIE models idea generation as a test-time idea evolution process, applying selection, crossover, and mutation with the isolation island paradigm and GRM-based fitness computation to incorporate cross-domain knowledge. It effectively mitigates the information cocoons arising from over-reliance on parametric knowledge and static literature. Extensive evaluations demonstrate that FlowPIE consistently produces ideas with higher novelty, feasibility and diversity compared to strong LLM-based and agent-based frameworks, while enabling reward scaling during test time.

FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration

Abstract

Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process. FlowPIE expands literature trajectories via a flow-guided Monte Carlo Tree Search (MCTS) inspired by GFlowNets, using the quality of current ideas assessed by an LLM-based generative reward model (GRM) as a supervised signal to guide adaptive retrieval and construct a diverse, high-quality initial population. Based on this population, FlowPIE models idea generation as a test-time idea evolution process, applying selection, crossover, and mutation with the isolation island paradigm and GRM-based fitness computation to incorporate cross-domain knowledge. It effectively mitigates the information cocoons arising from over-reliance on parametric knowledge and static literature. Extensive evaluations demonstrate that FlowPIE consistently produces ideas with higher novelty, feasibility and diversity compared to strong LLM-based and agent-based frameworks, while enabling reward scaling during test time.

Paper Structure

This paper contains 55 sections, 10 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Comparison of traditional literature-based SIG frameworks and our FlowPIE.
  • Figure 2: Test-time idea evolution scaling for reward. Initial Ideation couples literature exploration with idea generation via flow-guided MCTS, where higher reward, reflecting better initial ideas, amplify the weight of corresponding literature. Idea Evolution uses various evolutionary operators to guide ideas toward regions of higher rewards and stable convergence.
  • Figure 3: Overview of FlowPIE. Left: Idea initialization based on flow-guided MCTS: forward exploration via flow-guided UCB (Eq.\ref{['UCB']}) for nodes selection and expansion, and backward updating (Eq.\ref{['flow']}) to enforce local and global flow constraints; this dual process dynamically adjusts literature weights with GRM-based rewards on generated ideas. Right: Test-time idea evolution, supervised by GRM-based fitness computation, leveraging various operators, including crossover on core technical features and isolation-island-enhanced mutation, to evolve ideas.
  • Figure 4: (a) Distribution of explored literature count; (b) Diversity score distribution.
  • Figure 5: Dataset overlap distribution between our literature and two benchmarks across different similarity levels. The overlap ratio rapidly decreases as similarity increases and approaches zero in the high-similarity region, indicating minimal risk of data leakage.
  • ...and 6 more figures