Table of Contents
Fetching ...

From Funding to Findings (FIND): An Open Database of NSF Awards and Research Outputs

Kazimier Smith, Yucheng Lu, Qiaochu Fan

TL;DR

The paper presents FIND, an open-access database that systematically links NSF grant proposals to downstream publications, combining Crossref and PAR data with proposal and publication text to enable metascience and policy analyses. It demonstrates two NLP applications: predicting citation impact from grant language and extracting verifiable claims and findings with large-language models to measure proposal–outcome alignment. The work shows strong signal in grant metadata for predicting outputs and introduces a scalable framework for scoring how well findings align with initial proposals. It also highlights historical data gaps, policy-related shifts, and avenues for extending coverage, especially pre-2000 grants, to support broader meta-research on public funding effects.

Abstract

Public funding plays a central role in driving scientific discovery. To better understand the link between research inputs and outputs, we introduce FIND (Funding-Impact NSF Database), an open-access dataset that systematically links NSF grant proposals to their downstream research outputs, including publication metadata and abstracts. The primary contribution of this project is the creation of a large-scale, structured dataset that enables transparency, impact evaluation, and metascience research on the returns to public funding. To illustrate the potential of FIND, we present two proof-of-concept NLP applications. First, we analyze whether the language of grant proposals can predict the subsequent citation impact of funded research. Second, we leverage large language models to extract scientific claims from both proposals and resulting publications, allowing us to measure the extent to which funded projects deliver on their stated goals. Together, these applications highlight the utility of FIND for advancing metascience, informing funding policy, and enabling novel AI-driven analyses of the scientific process.

From Funding to Findings (FIND): An Open Database of NSF Awards and Research Outputs

TL;DR

The paper presents FIND, an open-access database that systematically links NSF grant proposals to downstream publications, combining Crossref and PAR data with proposal and publication text to enable metascience and policy analyses. It demonstrates two NLP applications: predicting citation impact from grant language and extracting verifiable claims and findings with large-language models to measure proposal–outcome alignment. The work shows strong signal in grant metadata for predicting outputs and introduces a scalable framework for scoring how well findings align with initial proposals. It also highlights historical data gaps, policy-related shifts, and avenues for extending coverage, especially pre-2000 grants, to support broader meta-research on public funding effects.

Abstract

Public funding plays a central role in driving scientific discovery. To better understand the link between research inputs and outputs, we introduce FIND (Funding-Impact NSF Database), an open-access dataset that systematically links NSF grant proposals to their downstream research outputs, including publication metadata and abstracts. The primary contribution of this project is the creation of a large-scale, structured dataset that enables transparency, impact evaluation, and metascience research on the returns to public funding. To illustrate the potential of FIND, we present two proof-of-concept NLP applications. First, we analyze whether the language of grant proposals can predict the subsequent citation impact of funded research. Second, we leverage large language models to extract scientific claims from both proposals and resulting publications, allowing us to measure the extent to which funded projects deliver on their stated goals. Together, these applications highlight the utility of FIND for advancing metascience, informing funding policy, and enabling novel AI-driven analyses of the scientific process.

Paper Structure

This paper contains 16 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Grant allocation by directorate
  • Figure 2: Grant-publication match rate
  • Figure 3: Fraction of grant funding in each directorate over time
  • Figure 4: Grant-publication match rate by directorate
  • Figure 5: Award and Publication Embedding Comparison via UMAP dimension reduction
  • ...and 5 more figures