Table of Contents
Fetching ...

Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks

Ghazal Kalhor, Afra Mashhadi

TL;DR

This paper investigates how memorization in large language models (LLMs) biases the construction of co-authorship networks. By comparing LLM-generated author networks to OpenAlex and Google Scholar across 1,596 seeds in 10 disciplines and 8 regions, the authors introduce Discoverable Network Extraction ($DNE$) as a metric to quantify memorization effects. They evaluate three LLMs of different sizes (DeepSeek R1, Llama 4 Scout, Mixtral 8x7B) under zero-shot prompting and find that highly cited authors are more accurately represented in LLM outputs, with larger models typically showing stronger biases; however, certain disciplines (notably Clinical Medicine) and regions (Sub-Saharan Africa, North Africa) display more balanced representations. The work highlights both risks and opportunities in applying LLMs to scientometrics, underscoring the need for robust validation, inclusive training data, and bias-mitigation strategies in scholarly discovery tools.

Abstract

Ongoing breakthroughs in Large Language Models (LLMs) are reshaping search and recommendation platforms at their core. While this shift unlocks powerful new scientometric tools, it also exposes critical fairness and bias issues that could erode the integrity of the information ecosystem. Additionally, as LLMs become more integrated into web-based searches for scholarly tools, their ability to generate summarized research work based on memorized data introduces new dimensions to these challenges. The extent of memorization in LLMs can impact the accuracy and fairness of the co-authorship networks they produce, potentially reflecting and amplifying existing biases within the scientific community and across different regions. This study critically examines the impact of LLM memorization on the co-authorship networks. To this end, we assess memorization effects across three prominent models, DeepSeek R1, Llama 4 Scout, and Mixtral 8x7B, analyzing how memorization-driven outputs vary across academic disciplines and world regions. While our global analysis reveals a consistent bias favoring highly cited researchers, this pattern is not uniformly observed. Certain disciplines, such as Clinical Medicine, and regions, including parts of Africa, show more balanced representation, pointing to areas where LLM training data may reflect greater equity. These findings underscore both the risks and opportunities in deploying LLMs for scholarly discovery.

Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks

TL;DR

This paper investigates how memorization in large language models (LLMs) biases the construction of co-authorship networks. By comparing LLM-generated author networks to OpenAlex and Google Scholar across 1,596 seeds in 10 disciplines and 8 regions, the authors introduce Discoverable Network Extraction () as a metric to quantify memorization effects. They evaluate three LLMs of different sizes (DeepSeek R1, Llama 4 Scout, Mixtral 8x7B) under zero-shot prompting and find that highly cited authors are more accurately represented in LLM outputs, with larger models typically showing stronger biases; however, certain disciplines (notably Clinical Medicine) and regions (Sub-Saharan Africa, North Africa) display more balanced representations. The work highlights both risks and opportunities in applying LLMs to scientometrics, underscoring the need for robust validation, inclusive training data, and bias-mitigation strategies in scholarly discovery tools.

Abstract

Ongoing breakthroughs in Large Language Models (LLMs) are reshaping search and recommendation platforms at their core. While this shift unlocks powerful new scientometric tools, it also exposes critical fairness and bias issues that could erode the integrity of the information ecosystem. Additionally, as LLMs become more integrated into web-based searches for scholarly tools, their ability to generate summarized research work based on memorized data introduces new dimensions to these challenges. The extent of memorization in LLMs can impact the accuracy and fairness of the co-authorship networks they produce, potentially reflecting and amplifying existing biases within the scientific community and across different regions. This study critically examines the impact of LLM memorization on the co-authorship networks. To this end, we assess memorization effects across three prominent models, DeepSeek R1, Llama 4 Scout, and Mixtral 8x7B, analyzing how memorization-driven outputs vary across academic disciplines and world regions. While our global analysis reveals a consistent bias favoring highly cited researchers, this pattern is not uniformly observed. Certain disciplines, such as Clinical Medicine, and regions, including parts of Africa, show more balanced representation, pointing to areas where LLM training data may reflect greater equity. These findings underscore both the risks and opportunities in deploying LLMs for scholarly discovery.

Paper Structure

This paper contains 31 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Violin plots of the logarithmic distribution of authors' citation counts across academic disciplines and regions.
  • Figure 2: Mean DNE for highly cited authors (olive) and low-cited authors (pink) across different academic disciplines, shown for each LLM–baseline pair.
  • Figure 3: Grouped bar plots showing the mean DNE across academic disciplines for each LLM, separated by baseline.
  • Figure 4: Mean DNE for highly cited authors (olive) and low-cited authors (pink) across different regions, shown for each LLM–baseline pair.
  • Figure 5: Grouped bar plots showing the mean DNE across regions for each LLM, separated by baseline.