Modeling citation concentration through a mixture of Leimkuhler curves

Emilio Gómez-Déniz; Pablo Dorta-González

Modeling citation concentration through a mixture of Leimkuhler curves

Emilio Gómez-Déniz, Pablo Dorta-González

TL;DR

The paper tackles modeling citation concentration in informetrics by extending Leimkuhler curves with mixture models to capture heterogeneity across journals. It introduces PG and PIG mixtures, deriving closed-form Leimkuhler expressions for several mixing distributions (gamma, inverse Gaussian, and Pareto-confluent hypergeometric) and establishing stochastic ordering results. The mixtures enable computation of concentration measures such as the Gini and Pietra indices, and are evaluated using nonlinear least squares on two data sources, showing superior fit over standard power- and Pareto-based Leimkuhler curves. While the approach yields practical improvements and theoretical insights, its validation is limited to two fields, suggesting avenues for broader application and further theoretical development.

Abstract

When a graphical representation of the cumulative percentage of total citations to articles, ordered from most cited to least cited, is plotted against the cumulative percentage of articles, we obtain a Leimkuhler curve. In this study, we noticed that standard Leimkuhler functions may not be sufficient to provide accurate fits to various empirical informetrics data. Therefore, we introduce a new approach to Leimkuhler curves by fitting a known probability density function to the initial Leimkuhler curve, taking into account the presence of a heterogeneity factor. As a significant contribution to the existing literature, we introduce a pair of mixture distributions (called PG and PIG) to bibliometrics. In addition, we present closed-form expressions for Leimkuhler curves. {Some measures of citation concentration are examined empirically for the basic models (based on the Power {and Pareto distributions}) and the mixed models derived from {these}.} An application to two sources of informetric data was conducted to see how the mixing models outperform the standard basic models. The different models were fitted using non-linear least squares estimation.

Modeling citation concentration through a mixture of Leimkuhler curves

TL;DR

Abstract

Paper Structure (13 sections, 36 equations, 2 figures, 1 table)

This paper contains 13 sections, 36 equations, 2 figures, 1 table.

Funding:
Acknowledgements
Highlights:
Introduction
The standard Leimkuhler curve
Beyond the standard model
Specific examples
The gamma case
The inverse Gaussian case
The Pareto-confluent hypergeometric distribution case
Stochastic ordering
Calculations
Conclusions

Figures (2)

Figure 1: Empirical and fitted, by non-linear least square, Leimkuhler curves based on the data considered. From top to down they correspond to equations given in \ref{['plc']}, \ref{['gplc']} and \ref{['plcp']}, respectively.
Figure 2: Empirical and fitted, by non-linear least square, Leimkuhler curves based on the data considered and mixture models. The left and right sides correspond to the mixture power-gamma (PG) and mixture power-inverse Gaussian (PIG) Leimkuhler curves, given in equations \ref{['mlcg']} and \ref{['mlcig']}, respectively.

Theorems & Definitions (1)

Definition 1

Modeling citation concentration through a mixture of Leimkuhler curves

TL;DR

Abstract

Modeling citation concentration through a mixture of Leimkuhler curves

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (1)