Table of Contents
Fetching ...

GotFunding: A grant recommendation system based on scientific articles

Tong Zeng, Daniel E. Acuna

TL;DR

GotFunding addresses the challenge of matching scientists to funding by treating grant search as a learning-to-rank problem trained on NIH grant–publication history. It builds a LambdaRank-based ranking model using LightGBM, integrating 31 statistical features and fastText-based semantic features derived from Federal RePORTER and PubMed data, and evaluates performance with $NDCG$ metrics. The results show strong top-k ranking with $NDCG@1=0.945$ on validation, and the analysis identifies temporal alignment, information density of publications, and publication–grant relevance as key predictors, with grant abstracts contributing significantly. The approach offers a practical path to improve pre-award matching, potentially reducing time spent by junior researchers and enabling online experimentation with grant recommendations.

Abstract

Obtaining funding is an important part of becoming a successful scientist. Junior faculty spend a great deal of time finding the right agencies and programs that best match their research profile. But what are the factors that influence the best publication--grant matching? Some universities might employ pre-award personnel to understand these factors, but not all institutions can afford to hire them. Historical records of publications funded by grants can help us understand the matching process and also help us develop recommendation systems to automate it. In this work, we present \textsc{GotFunding} (Grant recOmmendaTion based on past FUNDING), a recommendation system trained on National Institutes of Health's (NIH) grant--publication records. Our system achieves a high performance (NDCG@1 = 0.945) by casting the problem as learning to rank. By analyzing the features that make predictions effective, our results show that the ranking considers most important 1) the year difference between publication and grant grant, 2) the amount of information provided in the publication, and 3) the relevance of the publication to the grant. We discuss future improvements of the system and an online tool for scientists to try.

GotFunding: A grant recommendation system based on scientific articles

TL;DR

GotFunding addresses the challenge of matching scientists to funding by treating grant search as a learning-to-rank problem trained on NIH grant–publication history. It builds a LambdaRank-based ranking model using LightGBM, integrating 31 statistical features and fastText-based semantic features derived from Federal RePORTER and PubMed data, and evaluates performance with metrics. The results show strong top-k ranking with on validation, and the analysis identifies temporal alignment, information density of publications, and publication–grant relevance as key predictors, with grant abstracts contributing significantly. The approach offers a practical path to improve pre-award matching, potentially reducing time spent by junior researchers and enabling online experimentation with grant recommendations.

Abstract

Obtaining funding is an important part of becoming a successful scientist. Junior faculty spend a great deal of time finding the right agencies and programs that best match their research profile. But what are the factors that influence the best publication--grant matching? Some universities might employ pre-award personnel to understand these factors, but not all institutions can afford to hire them. Historical records of publications funded by grants can help us understand the matching process and also help us develop recommendation systems to automate it. In this work, we present \textsc{GotFunding} (Grant recOmmendaTion based on past FUNDING), a recommendation system trained on National Institutes of Health's (NIH) grant--publication records. Our system achieves a high performance (NDCG@1 = 0.945) by casting the problem as learning to rank. By analyzing the features that make predictions effective, our results show that the ranking considers most important 1) the year difference between publication and grant grant, 2) the amount of information provided in the publication, and 3) the relevance of the publication to the grant. We discuss future improvements of the system and an online tool for scientists to try.
Paper Structure (17 sections, 1 equation, 2 figures, 2 tables)

This paper contains 17 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The framework of our grant recommendation solution. The orange arrows denote the training pipeline and the green arrows represent the prediction pipeline.
  • Figure 2: Top 20 feature importance. The APP_[1,2,3,4] in the feature name denotes the four approaches used for the grant description. The Feature_#[1-31] corresponds to Table \ref{['tab:Features']}. The top three features are year difference between publication and grant, information content of publication, and relevance between publication and grant.