Table of Contents
Fetching ...

Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang

TL;DR

This study investigates data-driven augmentation of AI-assisted research ideation in social science, focusing on climate negotiations. It introduces two complementary approaches: metadata-guided idea generation to improve feasibility and automatic validation to support decision-making during idea selection. Across a ClimateDataBank-backed workflow and multiple evaluation modalities, metadata enhances feasibility and perceived impact, while validation improves ranking accuracy and aids selection; a human study further shows that referencing LLM-generated ideas can inspire researchers to produce higher-quality concepts. The work demonstrates the practical potential of data-driven ideation and highlights important trade-offs between novelty and empirical grounding, offering a path toward more tractable and impactful research ideas.

Abstract

Recent advancements in large language models (LLMs) have shown promise in generating novel research ideas. However, these ideas often face challenges related to feasibility and expected effectiveness. This paper explores how augmenting LLMs with relevant data during the idea generation process can enhance the quality of generated ideas. We introduce two ways of incorporating data: (1) providing metadata during the idea generation stage to guide LLMs toward feasible directions, and (2) adding automatic validation during the idea selection stage to assess the empirical plausibility of hypotheses within ideas. We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%, while automatic validation improves the overall quality of selected ideas by 7%. A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality. Our work highlights the potential of data-driven research idea generation, and underscores the practical utility of LLM-assisted ideation in real-world academic settings.

Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

TL;DR

This study investigates data-driven augmentation of AI-assisted research ideation in social science, focusing on climate negotiations. It introduces two complementary approaches: metadata-guided idea generation to improve feasibility and automatic validation to support decision-making during idea selection. Across a ClimateDataBank-backed workflow and multiple evaluation modalities, metadata enhances feasibility and perceived impact, while validation improves ranking accuracy and aids selection; a human study further shows that referencing LLM-generated ideas can inspire researchers to produce higher-quality concepts. The work demonstrates the practical potential of data-driven ideation and highlights important trade-offs between novelty and empirical grounding, offering a path toward more tractable and impactful research ideas.

Abstract

Recent advancements in large language models (LLMs) have shown promise in generating novel research ideas. However, these ideas often face challenges related to feasibility and expected effectiveness. This paper explores how augmenting LLMs with relevant data during the idea generation process can enhance the quality of generated ideas. We introduce two ways of incorporating data: (1) providing metadata during the idea generation stage to guide LLMs toward feasible directions, and (2) adding automatic validation during the idea selection stage to assess the empirical plausibility of hypotheses within ideas. We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%, while automatic validation improves the overall quality of selected ideas by 7%. A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality. Our work highlights the potential of data-driven research idea generation, and underscores the practical utility of LLM-assisted ideation in real-world academic settings.

Paper Structure

This paper contains 47 sections, 7 figures, 18 tables.

Figures (7)

  • Figure 1: Overview of how we incorporate data into the research idea generation process.
  • Figure 2: An example of metadata provided during idea generation, alongside a generated research idea.
  • Figure 3: Automatic evaluation results of ideas generated with (w.) and without metadata. A tabular version is in Appendix Table \ref{['table-metadata-auto']}.
  • Figure 4: An example of incorporating automatic validation into idea selection. We check the feasibility of hypotheses in ideas, conduct automatic validation of feasible hypotheses, and provide the ideas together with the summarized validation processes to the judge model.
  • Figure 5: Annotation interface for human evaluation of idea pairs.
  • ...and 2 more figures