Table of Contents
Fetching ...

Deciphering Scientific Collaboration in Biomedical LLM Research: Dynamics, Institutional Participation, and Resource Disparities

Lingyao Li, Zhijie Duan, Xuexin Li, Xiaoran Xu, Zhaoqian Xue, Siyuan Ma, Jin Jin

TL;DR

This paper investigates how LLMs reshape collaboration in biomedical research by analyzing 5,674 PubMed-indexed LLM-related papers alongside ML and general biomedical controls. It employs Shannon entropy to quantify collaboration diversity across institutions, disciplines, and countries, and uses network analysis to identify hub and bridging entities, while NIH FY2024 funding serves as a proxy for institutional resources. The findings show increasing collaboration diversity and a declining share of CS/AI authors in LLM work, but a centralized structure anchored by a core set of institutions and disciplines; resource levels strongly relate to output and influence, with strategic collaborations enabling resource-constrained institutions to achieve greater visibility. These results highlight both democratizing trends and persistent resource-based disparities, underscoring the importance of targeted collaboration strategies to promote equitable advancement in LLM-driven biomedicine.

Abstract

Large language models (LLMs) are increasingly transforming biomedical discovery and clinical innovation, yet their impact extends far beyond algorithmic revolution-LLMs are restructuring how scientific collaboration occurs, who participates, and how resources shape innovation. Despite this profound transformation, how this rapid technological shift is reshaping the structure and equity of scientific collaboration in biomedical LLM research remains largely unknown. By analyzing 5,674 LLM-related biomedical publications from PubMed, we examine how collaboration diversity evolves over time, identify institutions and disciplines that anchor and bridge collaboration networks, and assess how resource disparities underpin research performance. We find that collaboration diversity has grown steadily, with a decreasing share of Computer Science and Artificial Intelligence authors, suggesting that LLMs are lowering technical barriers for biomedical investigators. Network analysis reveals central institutions, including Stanford University and Harvard Medical School, and bridging disciplines such as Medicine and Computer Science that anchor collaborations in this field. Furthermore, biomedical research resources are strongly linked to research performance, with high-performing resource-constrained institutions exhibiting larger collaboration volume with the top 1% most connected institutions in the network. Together, these findings reveal a complex landscape, where democratizing trends coexist with a persistent, resource-driven hierarchy, highlighting the critical role of strategic collaboration in this evolving field.

Deciphering Scientific Collaboration in Biomedical LLM Research: Dynamics, Institutional Participation, and Resource Disparities

TL;DR

This paper investigates how LLMs reshape collaboration in biomedical research by analyzing 5,674 PubMed-indexed LLM-related papers alongside ML and general biomedical controls. It employs Shannon entropy to quantify collaboration diversity across institutions, disciplines, and countries, and uses network analysis to identify hub and bridging entities, while NIH FY2024 funding serves as a proxy for institutional resources. The findings show increasing collaboration diversity and a declining share of CS/AI authors in LLM work, but a centralized structure anchored by a core set of institutions and disciplines; resource levels strongly relate to output and influence, with strategic collaborations enabling resource-constrained institutions to achieve greater visibility. These results highlight both democratizing trends and persistent resource-based disparities, underscoring the importance of targeted collaboration strategies to promote equitable advancement in LLM-driven biomedicine.

Abstract

Large language models (LLMs) are increasingly transforming biomedical discovery and clinical innovation, yet their impact extends far beyond algorithmic revolution-LLMs are restructuring how scientific collaboration occurs, who participates, and how resources shape innovation. Despite this profound transformation, how this rapid technological shift is reshaping the structure and equity of scientific collaboration in biomedical LLM research remains largely unknown. By analyzing 5,674 LLM-related biomedical publications from PubMed, we examine how collaboration diversity evolves over time, identify institutions and disciplines that anchor and bridge collaboration networks, and assess how resource disparities underpin research performance. We find that collaboration diversity has grown steadily, with a decreasing share of Computer Science and Artificial Intelligence authors, suggesting that LLMs are lowering technical barriers for biomedical investigators. Network analysis reveals central institutions, including Stanford University and Harvard Medical School, and bridging disciplines such as Medicine and Computer Science that anchor collaborations in this field. Furthermore, biomedical research resources are strongly linked to research performance, with high-performing resource-constrained institutions exhibiting larger collaboration volume with the top 1% most connected institutions in the network. Together, these findings reveal a complex landscape, where democratizing trends coexist with a persistent, resource-driven hierarchy, highlighting the critical role of strategic collaboration in this evolving field.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Collaboration patterns and CS/AI participation in LLM, ML, and general research in biomedicine based on publications recorded in PubMed between June 2019 and January 2025. (a) Change of discipline, institution, and country entropy over time in LLM, ML, and general biomedical publications. (b) Change in the proportion of CS/AI authors in LLM publications.
  • Figure 2: Co-authorship networks based on authors’ (a) institution and (b) disciplinary affiliations. Each node represents a research institution or discipline, and each edge represents a co-authorship between researchers from the two respective institutions or disciplines. Node colors match the colors of the clusters (G1–G9) to which the nodes belong, where the clusters were determined based on the CNM algorithm. The network layout represents the community clusters determined based on the Fruchterman–Reingold force-directed layout algorithm.
  • Figure 3: Institutional research output is influenced by institutional resource levels as well as collaboration and network positioning. (a) Polynomial fits (log–log scale) relating FY2024 NIH funding (biomedical research resources) to LLM-related outcomes: degree centrality, publication count, and citations per paper. Shaded bands are 95% confidence intervals of the fitted curves (highlighted in yellow). Outliers are institutions with residuals $> \pm 1.5$ SD from the fitted trend. Points above the funding-based expectation are shown in blue; those below are in red. (b) Top five U.S. institutions by degree or betweenness rank with unambiguous NIH awardee-name mapping. Columns report each institution’s degree/betweenness rank and its FY2024 NIH funding rank. (c) Proportion of collaborations with network hubs (and, separately, with bridges), for institutions in the below-median NIH-funding group, contrasting ‘high achievers’ (higher-than-expected citations per paper) with the remainder. Hubs/bridges are defined using degree/betweenness centrality thresholds (top 1%, 2.5%, or 5%). Boxplots show distributions with group means overlaid; one-sided Wilcoxon rank-sum tests evaluate whether high achievers allocate a larger share to hubs.