Table of Contents
Fetching ...

Mitigating Consequences of Prestige in Citations of Publications

Michael Balzer, Adhen Benlahlou

TL;DR

This study tackles the Matthew effect in citations by predicting paper citations using only observable pre-publication attributes available during double-blind peer review, bridging scientometrics with objective evaluation. It employs linear and generalized linear models, with an arsinh transformation for the continuous weighted citation outcome and a negative binomial model for counts, validated on large PubMed-derived datasets enriched with MeSH, references, and language features. Results show substantial explanatory power from pre-publication features (e.g., number of references, MeSH term diversity, paper length) and consistent predictive performance across train/test splits, suggesting that fairer funding assessments can be achieved by relying on these variables rather than prestige signals. The work also demonstrates robustness via model-based gradient boosting, highlighting a path toward data-driven yet unbiased citation forecasting applicable to policy contexts in science funding.

Abstract

For many public research organizations, funding creation of science and maximizing scientific output is of central interest. Typically, when evaluating scientific production for funding, citations are utilized as a proxy, although these are severely influenced by factors beyond scientific impact. This study aims to mitigate the consequences of the Matthew effect in citations, where prominent authors and prestigious journals receive more citations regardless of the scientific content of the publications. To this end, the study presents an approach to predicting citations of papers based solely on observable characteristics available at the submission stage of a double-blind peer-review process. Combining classical linear models, generalized linear models and utilizing large-scale data sets on biomedical papers based on the PubMed database, the results demonstrate that it is possible to make fairly accurate predictions of citations using only observable characteristics of papers excluding information on authors and journals, thereby mitigating the Matthew effect. Thus, the outcomes have important implications for the field of scientometrics, providing a more objective method for citation prediction by relying on pre-publication variables that are immune to manipulation by authors and journals, thereby enhancing the objectivity of the evaluation process. Our approach is thus important for government agencies responsible for funding the creation of high-quality scientific content rather than perpetuating prestige.

Mitigating Consequences of Prestige in Citations of Publications

TL;DR

This study tackles the Matthew effect in citations by predicting paper citations using only observable pre-publication attributes available during double-blind peer review, bridging scientometrics with objective evaluation. It employs linear and generalized linear models, with an arsinh transformation for the continuous weighted citation outcome and a negative binomial model for counts, validated on large PubMed-derived datasets enriched with MeSH, references, and language features. Results show substantial explanatory power from pre-publication features (e.g., number of references, MeSH term diversity, paper length) and consistent predictive performance across train/test splits, suggesting that fairer funding assessments can be achieved by relying on these variables rather than prestige signals. The work also demonstrates robustness via model-based gradient boosting, highlighting a path toward data-driven yet unbiased citation forecasting applicable to policy contexts in science funding.

Abstract

For many public research organizations, funding creation of science and maximizing scientific output is of central interest. Typically, when evaluating scientific production for funding, citations are utilized as a proxy, although these are severely influenced by factors beyond scientific impact. This study aims to mitigate the consequences of the Matthew effect in citations, where prominent authors and prestigious journals receive more citations regardless of the scientific content of the publications. To this end, the study presents an approach to predicting citations of papers based solely on observable characteristics available at the submission stage of a double-blind peer-review process. Combining classical linear models, generalized linear models and utilizing large-scale data sets on biomedical papers based on the PubMed database, the results demonstrate that it is possible to make fairly accurate predictions of citations using only observable characteristics of papers excluding information on authors and journals, thereby mitigating the Matthew effect. Thus, the outcomes have important implications for the field of scientometrics, providing a more objective method for citation prediction by relying on pre-publication variables that are immune to manipulation by authors and journals, thereby enhancing the objectivity of the evaluation process. Our approach is thus important for government agencies responsible for funding the creation of high-quality scientific content rather than perpetuating prestige.

Paper Structure

This paper contains 13 sections, 8 equations, 2 figures, 16 tables.

Figures (2)

  • Figure 1: Histogram and kernel density of weighted citations (SJR)
  • Figure 2: Histogram and kernel density of citation counts ($Citations_i \leq 250$)