Table of Contents
Fetching ...

BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

Ruiyu Wang, Yuzhang Xie, Xiao Hu, Carl Yang, Jiaying Lu

TL;DR

BioMedJImpact addresses how collaboration intensity and AI engagement jointly shape biomedical journal impact by building a large multi-source dataset and a three-stage LLM pipeline to derive AI engagement signals from abstracts, then demonstrating associations with IF, Total Cites, and Quartile using linear mixed-effects models; human evaluation confirms the reliability of AI annotations; the work offers a scalable resource and methodology for analyzing AI’s role in biomedical publishing, with the key metric $E_{j,t} = \frac{N^{AI}_{j,t}}{N^{total}_{j,t}}$ illustrating AI engagement rate.

Abstract

Assessing journal impact is central to scholarly communication, yet existing open resources rarely capture how collaboration structures and artificial intelligence (AI) research jointly shape venue prestige in biomedicine. We present BioMedJImpact, a large-scale, biomedical-oriented dataset designed to advance journal-level analysis of scientific impact and AI engagement. Built from 1.74 million PubMed Central articles across 2,744 journals, BioMedJImpact integrates bibliometric indicators, collaboration features, and LLM-derived semantic indicators for AI engagement. Specifically, the AI engagement feature is extracted through a reproducible three-stage LLM pipeline that we propose. Using this dataset, we analyze how collaboration intensity and AI engagement jointly influence scientific impact across pre- and post-pandemic periods (2016-2019, 2020-2023). Two consistent trends emerge: journals with higher collaboration intensity, particularly those with larger and more diverse author teams, tend to achieve greater citation impact, and AI engagement has become an increasingly strong correlate of journal prestige, especially in quartile rankings. To further validate the three-stage LLM pipeline we proposed for deriving the AI engagement feature, we conduct human evaluation, confirming substantial agreement in AI relevance detection and consistent subfield classification. Together, these contributions demonstrate that BioMedJImpact serves as both a comprehensive dataset capturing the intersection of biomedicine and AI, and a validated methodological framework enabling scalable, content-aware scientometric analysis of scientific impact and innovation dynamics. Code is available at https://github.com/JonathanWry/BioMedJImpact.

BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

TL;DR

BioMedJImpact addresses how collaboration intensity and AI engagement jointly shape biomedical journal impact by building a large multi-source dataset and a three-stage LLM pipeline to derive AI engagement signals from abstracts, then demonstrating associations with IF, Total Cites, and Quartile using linear mixed-effects models; human evaluation confirms the reliability of AI annotations; the work offers a scalable resource and methodology for analyzing AI’s role in biomedical publishing, with the key metric illustrating AI engagement rate.

Abstract

Assessing journal impact is central to scholarly communication, yet existing open resources rarely capture how collaboration structures and artificial intelligence (AI) research jointly shape venue prestige in biomedicine. We present BioMedJImpact, a large-scale, biomedical-oriented dataset designed to advance journal-level analysis of scientific impact and AI engagement. Built from 1.74 million PubMed Central articles across 2,744 journals, BioMedJImpact integrates bibliometric indicators, collaboration features, and LLM-derived semantic indicators for AI engagement. Specifically, the AI engagement feature is extracted through a reproducible three-stage LLM pipeline that we propose. Using this dataset, we analyze how collaboration intensity and AI engagement jointly influence scientific impact across pre- and post-pandemic periods (2016-2019, 2020-2023). Two consistent trends emerge: journals with higher collaboration intensity, particularly those with larger and more diverse author teams, tend to achieve greater citation impact, and AI engagement has become an increasingly strong correlate of journal prestige, especially in quartile rankings. To further validate the three-stage LLM pipeline we proposed for deriving the AI engagement feature, we conduct human evaluation, confirming substantial agreement in AI relevance detection and consistent subfield classification. Together, these contributions demonstrate that BioMedJImpact serves as both a comprehensive dataset capturing the intersection of biomedicine and AI, and a validated methodological framework enabling scalable, content-aware scientometric analysis of scientific impact and innovation dynamics. Code is available at https://github.com/JonathanWry/BioMedJImpact.

Paper Structure

This paper contains 18 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Exploratory visualizations of quartile and collaboration indicators in the integrated dataset. Panels (a)–(b) summarize quartile dynamics and stability; panels (c)–(d) depict collaboration trends and collaboration intensity by quartile.
  • Figure 2: Overview of our LLM pipeline for AI engagement analysis from PMC abstracts. Step 1 filters AI-relevant abstracts. Step 2 extracts AI terms and maps them to a controlled taxonomy of AI subfield. Step 3 validates extracted terms to reduce ambiguity and false positives.
  • Figure 3: AI engagement patterns derived from LLM-based content annotation. Panel (a): Top‑10 by pooled mean AI% over all journal–year rows within each subject category; boxes show distributions, diamonds show means. Panel (b): Top‑15 by year‑normalized mean AI%—mean across journals within each category–year, then mean across years.
  • Figure 4: Subject category-specific word clouds of validated AI subfield keywords. From left to right: (a) Math and Computational Biology, (b) Radiology and Imaging, and (c) Healthcare Science and Services. Word size reflects frequency of extracted AI concepts within each journal subset; color and position are aesthetic only.
  • Figure 5: Human evaluation results. (a) Pairwise Cohen's $\kappa$ for each annotator pair and metric, with an "Overall" bar showing three-rater Fleiss' $\kappa$. (b) Per-annotator and overall mean scores with standard errors: AI relevance accuracy (left axis, 0–1) and subfield accuracy/completeness (right axis, 1-3).