Table of Contents
Fetching ...

Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

Fatemeh Shahhosseini, Arash Marioriyad, Ali Momen, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban, Shaghayegh Haghjooy Javanmard

TL;DR

This paper surveys Large Language Models (LLMs) for scientific idea generation, emphasizing the dual goals of novelty and valueness. It organizes methods into five families—knowledge augmentation, prompt-driven creativity, inference-time search, collaborative multi-agent systems, and parameter-level adaptations—and interprets them through Boden's combinatorial/exploratory/transformational creativity and Rhodes' 4Ps (Person, Process, Press, Product). The work highlights how grounding, prompting, search-time reasoning, agent collaboration, and training-time alignment collectively advance creative scientific ideation, while also identifying evaluation bottlenecks and gaps in standard benchmarks. It argues that current progress largely achieves combinatorial and exploratory creativity, with transformational shifts still elusive, and calls for open-ended benchmarks, richer simulators, and architectural innovations to push toward reliable, transformative AI-assisted discovery.

Abstract

Scientific idea generation lies at the heart of scientific discovery and has driven human progress-whether by solving unsolved problems or proposing novel hypotheses to explain unknown phenomena. Unlike standard scientific reasoning or general creative generation, idea generation in science is a multi-objective and open-ended task, where the novelty of a contribution is as essential as its empirical soundness. Large language models (LLMs) have recently emerged as promising generators of scientific ideas, capable of producing coherent and factual outputs with surprising intuition and acceptable reasoning, yet their creative capacity remains inconsistent and poorly understood. This survey provides a structured synthesis of methods for LLM-driven scientific ideation, examining how different approaches balance creativity with scientific soundness. We categorize existing methods into five complementary families: External knowledge augmentation, Prompt-based distributional steering, Inference-time scaling, Multi-agent collaboration, and Parameter-level adaptation. To interpret their contributions, we employ two complementary frameworks: Boden's taxonomy of Combinatorial, Exploratory and Transformational creativity to characterize the level of ideas each family expected to generate, and Rhodes' 4Ps framework-Person, Process, Press, and Product-to locate the aspect or source of creativity that each method emphasizes. By aligning methodological advances with creativity frameworks, this survey clarifies the state of the field and outlines key directions toward reliable, systematic, and transformative applications of LLMs in scientific discovery.

Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

TL;DR

This paper surveys Large Language Models (LLMs) for scientific idea generation, emphasizing the dual goals of novelty and valueness. It organizes methods into five families—knowledge augmentation, prompt-driven creativity, inference-time search, collaborative multi-agent systems, and parameter-level adaptations—and interprets them through Boden's combinatorial/exploratory/transformational creativity and Rhodes' 4Ps (Person, Process, Press, Product). The work highlights how grounding, prompting, search-time reasoning, agent collaboration, and training-time alignment collectively advance creative scientific ideation, while also identifying evaluation bottlenecks and gaps in standard benchmarks. It argues that current progress largely achieves combinatorial and exploratory creativity, with transformational shifts still elusive, and calls for open-ended benchmarks, richer simulators, and architectural innovations to push toward reliable, transformative AI-assisted discovery.

Abstract

Scientific idea generation lies at the heart of scientific discovery and has driven human progress-whether by solving unsolved problems or proposing novel hypotheses to explain unknown phenomena. Unlike standard scientific reasoning or general creative generation, idea generation in science is a multi-objective and open-ended task, where the novelty of a contribution is as essential as its empirical soundness. Large language models (LLMs) have recently emerged as promising generators of scientific ideas, capable of producing coherent and factual outputs with surprising intuition and acceptable reasoning, yet their creative capacity remains inconsistent and poorly understood. This survey provides a structured synthesis of methods for LLM-driven scientific ideation, examining how different approaches balance creativity with scientific soundness. We categorize existing methods into five complementary families: External knowledge augmentation, Prompt-based distributional steering, Inference-time scaling, Multi-agent collaboration, and Parameter-level adaptation. To interpret their contributions, we employ two complementary frameworks: Boden's taxonomy of Combinatorial, Exploratory and Transformational creativity to characterize the level of ideas each family expected to generate, and Rhodes' 4Ps framework-Person, Process, Press, and Product-to locate the aspect or source of creativity that each method emphasizes. By aligning methodological advances with creativity frameworks, this survey clarifies the state of the field and outlines key directions toward reliable, systematic, and transformative applications of LLMs in scientific discovery.

Paper Structure

This paper contains 58 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the training-free methods and requirements for scientific idea generation using LLMs. (a) Knowledge Augmentation: The pipeline begins with integrating external sources such as research papers, databases, and knowledge graphs to ground the model’s reasoning, either through relational linking or semantic similarity-based retrieval. (b) Prompt-Driven Techniques: The model can be steered by modifying system prompts or input instructions, including structured prompts, adversarial queries, persona and role priming, and multilingual prompting. (c) Test-Time Scaling via Search: At inference time, search-based methods explore multiple candidate ideas through sequential refinement, parallel refinement, or branching exploration, enabling scalable reasoning. (d) Evaluation Metrics: Candidate hypotheses are evaluated based on key scientific metrics such as novelty, feasibility, and potential impact. (e) Feedback Sources: Search and generation are guided by feedback signals from human experts, scientific rules, internal model confidence, or external tools/simulators, which iteratively refine and improve the generated ideas. (f) Generation system: Single-agent systems versus multi-agent frameworks. Multi-agent setups include pipeline-oriented workflows for automation and debate-based interactions, which can yield emergent behaviors in LLMs. This multi-stage process collectively enables LLMs to produce creative, reliable, and high-value scientific hypotheses.
  • Figure 2: A four-level taxonomy for scientific idea generation methods using LLMs.
  • Figure 3: Conceptual Mapping LLM-driven scientific ideation methods to their primary source of creativity, inspired by Rhodes’ 4Ps framework. Here, we focus on the generative dimensions—Person, Process, and Press—as sources of creativity, with Product reserved for evaluation. Press includes knowledge augmentation and prompt engineering, supplying external context and inspiration. Process covers multi-agent collaboration and inference-time search strategies that guide exploration. Person encompasses design choices and inductive biases in model architecture and training -including objectives, prediction formats, and alternative generative structures. Intersections illustrate methods that combine dimensions: supervised fine-tuning (Person + Press) internalizes external knowledge, while reinforcement learning (Person + Process) internalizes search strategies, enhancing the model’s reasoning and creative capacities.