Table of Contents
Fetching ...

SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings

Farhana Keya, Gollam Rabby, Prasenjit Mitra, Sahar Vahdati, Sören Auer, Yaser Jaradeh

TL;DR

SCI-IDEA presents a two-module framework that uses LLM prompting and Aha Moment detection to generate context-aware scientific ideas. It extracts facets from researchers' prior work and related literature to identify gaps, then generates and evaluates ideas along novelty, excitement, feasibility, and effectiveness using semantic embeddings and likelihood-based surprise measures. Across multiple LLMs and prompting configurations, the approach achieves average scores around 6.8–6.9 on a 1–10 scale, with sentence-level embeddings particularly enhancing novelty and transformative potential. The study also engages human experts to validate ideas and explores limitations and ethics, emphasizing responsible, credit-acknowledged human-AI collaboration in scientific ideation. Overall, SCI-IDEA demonstrates the potential to structure flexible, context-aware ideation while revealing practical trade-offs and avenues for future improvement.

Abstract

Every scientific discovery starts with an idea inspired by prior work, interdisciplinary concepts, and emerging challenges. Recent advancements in large language models (LLMs) trained on scientific corpora have driven interest in AI-supported idea generation. However, generating context-aware, high-quality, and innovative ideas remains challenging. We introduce SCI-IDEA, a framework that uses LLM prompting strategies and Aha Moment detection for iterative idea refinement. SCI-IDEA extracts essential facets from research publications, assessing generated ideas on novelty, excitement, feasibility, and effectiveness. Comprehensive experiments validate SCI-IDEA's effectiveness, achieving average scores of 6.84, 6.86, 6.89, and 6.84 (on a 1-10 scale) across novelty, excitement, feasibility, and effectiveness, respectively. Evaluations employed GPT-4o, GPT-4.5, DeepSeek-32B (each under 2-shot prompting), and DeepSeek-70B (3-shot prompting), with token-level embeddings used for Aha Moment detection. Similarly, it achieves scores of 6.87, 6.86, 6.83, and 6.87 using GPT-4o under 5-shot prompting, GPT-4.5 under 3-shot prompting, DeepSeek-32B under zero-shot chain-of-thought prompting, and DeepSeek-70B under 5-shot prompting with sentence-level embeddings. We also address ethical considerations such as intellectual credit, potential misuse, and balancing human creativity with AI-driven ideation. Our results highlight SCI-IDEA's potential to facilitate the structured and flexible exploration of context-aware scientific ideas, supporting innovation while maintaining ethical standards.

SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings

TL;DR

SCI-IDEA presents a two-module framework that uses LLM prompting and Aha Moment detection to generate context-aware scientific ideas. It extracts facets from researchers' prior work and related literature to identify gaps, then generates and evaluates ideas along novelty, excitement, feasibility, and effectiveness using semantic embeddings and likelihood-based surprise measures. Across multiple LLMs and prompting configurations, the approach achieves average scores around 6.8–6.9 on a 1–10 scale, with sentence-level embeddings particularly enhancing novelty and transformative potential. The study also engages human experts to validate ideas and explores limitations and ethics, emphasizing responsible, credit-acknowledged human-AI collaboration in scientific ideation. Overall, SCI-IDEA demonstrates the potential to structure flexible, context-aware ideation while revealing practical trade-offs and avenues for future improvement.

Abstract

Every scientific discovery starts with an idea inspired by prior work, interdisciplinary concepts, and emerging challenges. Recent advancements in large language models (LLMs) trained on scientific corpora have driven interest in AI-supported idea generation. However, generating context-aware, high-quality, and innovative ideas remains challenging. We introduce SCI-IDEA, a framework that uses LLM prompting strategies and Aha Moment detection for iterative idea refinement. SCI-IDEA extracts essential facets from research publications, assessing generated ideas on novelty, excitement, feasibility, and effectiveness. Comprehensive experiments validate SCI-IDEA's effectiveness, achieving average scores of 6.84, 6.86, 6.89, and 6.84 (on a 1-10 scale) across novelty, excitement, feasibility, and effectiveness, respectively. Evaluations employed GPT-4o, GPT-4.5, DeepSeek-32B (each under 2-shot prompting), and DeepSeek-70B (3-shot prompting), with token-level embeddings used for Aha Moment detection. Similarly, it achieves scores of 6.87, 6.86, 6.83, and 6.87 using GPT-4o under 5-shot prompting, GPT-4.5 under 3-shot prompting, DeepSeek-32B under zero-shot chain-of-thought prompting, and DeepSeek-70B under 5-shot prompting with sentence-level embeddings. We also address ethical considerations such as intellectual credit, potential misuse, and balancing human creativity with AI-driven ideation. Our results highlight SCI-IDEA's potential to facilitate the structured and flexible exploration of context-aware scientific ideas, supporting innovation while maintaining ethical standards.

Paper Structure

This paper contains 25 sections, 3 equations, 36 figures, 5 tables.

Figures (36)

  • Figure 1: Overview of researcher and SCI-IDEA interactions. The left side illustrates researcher interactions and feedback, while the right side highlights SCI-IDEA's techniques for generating and refining context-aware scientific ideas.
  • Figure 2: Overview of SCI-IDEA framework. The upper side shows Module 1: Context Retrieval, Facet Extraction, and Research Gap Identification. The lower side illustrates Module 2: Idea Generation, Evaluation, Aha Moment Detection, and Refinement, along with their respective components.
  • Figure 3: Human Evaluation of SCI-IDEA by Embedding Strategy. Scores for novelty, excitement, feasibility, and effectiveness (left to right: without embedding, token-level embedding, sentence-level embedding).
  • Figure 4: Comparison of Human vs. LLM Scores in SCI-IDEA. Evaluation scores are across different prompting strategies (left to right: without embedding, token-level embedding, and sentence-level embedding).
  • Figure 5: Prompt for Paper Facet Finder ZS.
  • ...and 31 more figures