Modeling citation concentration through a mixture of Leimkuhler curves
Emilio Gómez-Déniz, Pablo Dorta-González
TL;DR
The paper tackles modeling citation concentration in informetrics by extending Leimkuhler curves with mixture models to capture heterogeneity across journals. It introduces PG and PIG mixtures, deriving closed-form Leimkuhler expressions for several mixing distributions (gamma, inverse Gaussian, and Pareto-confluent hypergeometric) and establishing stochastic ordering results. The mixtures enable computation of concentration measures such as the Gini and Pietra indices, and are evaluated using nonlinear least squares on two data sources, showing superior fit over standard power- and Pareto-based Leimkuhler curves. While the approach yields practical improvements and theoretical insights, its validation is limited to two fields, suggesting avenues for broader application and further theoretical development.
Abstract
When a graphical representation of the cumulative percentage of total citations to articles, ordered from most cited to least cited, is plotted against the cumulative percentage of articles, we obtain a Leimkuhler curve. In this study, we noticed that standard Leimkuhler functions may not be sufficient to provide accurate fits to various empirical informetrics data. Therefore, we introduce a new approach to Leimkuhler curves by fitting a known probability density function to the initial Leimkuhler curve, taking into account the presence of a heterogeneity factor. As a significant contribution to the existing literature, we introduce a pair of mixture distributions (called PG and PIG) to bibliometrics. In addition, we present closed-form expressions for Leimkuhler curves. {Some measures of citation concentration are examined empirically for the basic models (based on the Power {and Pareto distributions}) and the mixed models derived from {these}.} An application to two sources of informetric data was conducted to see how the mixing models outperform the standard basic models. The different models were fitted using non-linear least squares estimation.
