Table of Contents
Fetching ...

Quantifying the Relevance of Youth Research Cited in the US Policy Documents

Miftahul Jannat Mokarrama, Hamed Alhoori

TL;DR

This study addresses how to quantify the relevance of youth-focused research cited in US policy documents. It builds a data pipeline from Overton (2000–2022), collects PDFs, and employs TF-IDF and 11 pretrained LLMs within the SBERT framework to compute semantic similarity between research articles and citing policy texts, producing a policy-relevance score for each article. Findings indicate domain-specific models (e.g., ClinicalBERT, BioBERT) yield higher relevance scores, while full-text vs abstract signals are similar for this task, suggesting abstracts capture most policy-relevant information. The work demonstrates a scalable, open-source approach to measuring research impact on policy and proposes future directions including impact prediction and the development of robust policy-relevance metrics.

Abstract

In recent years, there has been a growing concern and emphasis on conducting research beyond academic or scientific research communities, benefiting society at large. A well-known approach to measuring the impact of research on society is enumerating its policy citation(s). Despite the importance of research in informing policy, there is no concrete evidence to suggest the research's relevance in cited policy documents. This is concerning because it may increase the possibility of evidence used in policy being manipulated by individual, social, or political biases that may lead to inappropriate, fragmented, or archaic research evidence in policy. Therefore, it is crucial to identify the degree of relevance between research articles and citing policy documents. In this paper, we examined the scale of contextual relevance of youth-focused research in the referenced US policy documents using natural language processing techniques, state-of-the-art pre-trained Large Language Models (LLMs), and statistical analysis. Our experiments and analysis concluded that youth-related research articles that get US policy citations are mostly relevant to the citing policy documents.

Quantifying the Relevance of Youth Research Cited in the US Policy Documents

TL;DR

This study addresses how to quantify the relevance of youth-focused research cited in US policy documents. It builds a data pipeline from Overton (2000–2022), collects PDFs, and employs TF-IDF and 11 pretrained LLMs within the SBERT framework to compute semantic similarity between research articles and citing policy texts, producing a policy-relevance score for each article. Findings indicate domain-specific models (e.g., ClinicalBERT, BioBERT) yield higher relevance scores, while full-text vs abstract signals are similar for this task, suggesting abstracts capture most policy-relevant information. The work demonstrates a scalable, open-source approach to measuring research impact on policy and proposes future directions including impact prediction and the development of robust policy-relevance metrics.

Abstract

In recent years, there has been a growing concern and emphasis on conducting research beyond academic or scientific research communities, benefiting society at large. A well-known approach to measuring the impact of research on society is enumerating its policy citation(s). Despite the importance of research in informing policy, there is no concrete evidence to suggest the research's relevance in cited policy documents. This is concerning because it may increase the possibility of evidence used in policy being manipulated by individual, social, or political biases that may lead to inappropriate, fragmented, or archaic research evidence in policy. Therefore, it is crucial to identify the degree of relevance between research articles and citing policy documents. In this paper, we examined the scale of contextual relevance of youth-focused research in the referenced US policy documents using natural language processing techniques, state-of-the-art pre-trained Large Language Models (LLMs), and statistical analysis. Our experiments and analysis concluded that youth-related research articles that get US policy citations are mostly relevant to the citing policy documents.

Paper Structure

This paper contains 12 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Distribution of the citation count of research articles (1 to 10) in the US policy documents.
  • Figure 2: Research article and citing policy document relevance score calculation
  • Figure 3: Box-Whisker plots of the relevance scores for all models in Case 1 after removing outliers
  • Figure 4: Box-Whisker plots of the relevance scores for all models and in Case 2 after removing outliers
  • Figure 5: Box-Whisker plots of the relevance scores for all models and in Case 3 after removing outliers