Table of Contents
Fetching ...

An Agent-based Model of Citation Behavior

George Chacko, Minhyuk Park, Vikram Ramavarapu, Ananth Grama, Pablo Robles-Granda, Tandy Warnow

TL;DR

This study investigates how citation dynamics arise from agent-level decision rules within a growing citation network. It presents an agent-based model where each new article cites a generator and allocates references based on a phenotype combining fitness, preferential attachment, recency, and locality governed by a locality parameter $α$. The results indicate fitness is the dominant driver of citations, with out_degree and locality modulating outcomes, and reveal ‘superstar’ effects that can quench or amplify citations under certain conditions. The work provides open-source software for simulating synthetic citation networks and offers insights into the sources of citation disparities, with implications for interpreting productivity metrics across disciplines.

Abstract

Whether citations can be objectively and reliably used to measure productivity and scientific quality of articles and researchers can, and should, be vigorously questioned. However, citations are widely used to estimate the productivity of researchers and institutions, effectively creating a 'grubby' motivation to be well-cited. We model citation growth, and this grubby interest using an agent-based model (ABM) of network growth. In this model, each new node (article) in a citation network is an autonomous agent that cites other nodes based on a 'citation personality' consisting of a composite bias for locality, preferential attachment, recency, and fitness. We ask whether strategic citation behavior (reference selection) by the author of a scientific article can boost subsequent citations to it. Our study suggests that fitness and, to a lesser extent, out_degree and locality effects are influential in capturing citations, which raises questions about similar effects in the real world.

An Agent-based Model of Citation Behavior

TL;DR

This study investigates how citation dynamics arise from agent-level decision rules within a growing citation network. It presents an agent-based model where each new article cites a generator and allocates references based on a phenotype combining fitness, preferential attachment, recency, and locality governed by a locality parameter . The results indicate fitness is the dominant driver of citations, with out_degree and locality modulating outcomes, and reveal ‘superstar’ effects that can quench or amplify citations under certain conditions. The work provides open-source software for simulating synthetic citation networks and offers insights into the sources of citation disparities, with implications for interpreting productivity metrics across disciplines.

Abstract

Whether citations can be objectively and reliably used to measure productivity and scientific quality of articles and researchers can, and should, be vigorously questioned. However, citations are widely used to estimate the productivity of researchers and institutions, effectively creating a 'grubby' motivation to be well-cited. We model citation growth, and this grubby interest using an agent-based model (ABM) of network growth. In this model, each new node (article) in a citation network is an autonomous agent that cites other nodes based on a 'citation personality' consisting of a composite bias for locality, preferential attachment, recency, and fitness. We ask whether strategic citation behavior (reference selection) by the author of a scientific article can boost subsequent citations to it. Our study suggests that fitness and, to a lesser extent, out_degree and locality effects are influential in capturing citations, which raises questions about similar effects in the real world.

Paper Structure

This paper contains 35 sections, 7 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Citation counts of nodes (in_degree) under default simulation conditions Using either the Stahl-Johnstone real-world sj dataset or Erdős-Rényi (er) graphs as input, simulations were performed in either random (ra) or homogenous (sa) agent backgrounds for 30 cycles of 3% annual growth (standard), resulting in an increase from roughly half a million nodes to 1.2 million nodes. In_degree of nodes is plotted against fitness groups using a log10 scale on the y-axis for in_degree after excluding agents with zero in_degree (roughly 4.6% for sj and 4.2% for er). Nodes are grouped by fitness values: f1(1:10), f2(10:100), f3(100:1,000), f4(1,000:10,000), f5(10,000:100,000), f6(100,000: 1,000,000). Due to the power law distribution of fitness, roughly $85$% of the observations in each simulation are found in group f1, $12$% in f2, $3$% in f3 and $\sim$$10^{-5}$% in f6.
  • Figure 2: Effect of Superstar Nodes Using real-world sj data as input, standard simulations were performed in either random (ra) or homogenous (sa) agent backgrounds resulting in an increase from roughly half a million nodes to 1.2 million nodes. To examine the effects of high fitness nodes, three 'superstar' agents with fitness 10,000, 100,000, and 1,000,000 were generated in year 1 of the simulation, allowing 29 years in which to accumulate citations (fitness groups 4-6, panels 2 and 4 from left to right). A reduction of in_degree is seen in f3 when superstars are present irrespective of whether the background is random or static. Nodes are grouped by fitness values f1(1:10), f2(10:100), f3(100:1,000), f4(1,000:10,000), f5(10,000:100,000), f6(100,000: 1,000,000). A log10 scale is used on the y-axis for in_degree after excluding agents without any in_degree, (roughly 4.6% for sj and 4.2% for er). The control group (no_ss) does not have any agents with fitness $> 1,000$.
  • Figure 3: Reference Count Effects Higher out_degree is associated with higher in_degree for power-law, normal, and uniform distributions of out_degree. Standard simulations were conducted on the sj dataset in either ra or sa agent backgrounds. For the ra background, $\alpha$ was randomized from 0 to1 and weights for preferential attachment, recency and fitness were randomly assigned that summed to 1. For sa, $\alpha$ was set to 0.5 and weights for preferential attachment, recency and fitness were set to $0.33\bar{3}$, summing to 1. Out_degree was assigned to agents from real-world values ranging from 5-249 that were fit to power-law, normal, and uniform distributions (Materials and Methods). Median, 90th, and 99th percentile values of in_degree (pctl) are plotted against out_degree.
  • Figure 4: Reference Count Effects as $\alpha$ is Varied The effect of out_degree on in_degree is modulated by $\alpha$. For $\alpha >0$, higher out_degree (x-axis) is associated with higher in_degree (y-axis) for real-world derived distributions of out_degree; in contrast, there is no impact when $\alpha=0$. Median, 75th, 90th, and 99th percentile values of agent in_degree (pctl) are plotted against out_degree. Standard simulations were conducted on the sj dataset in either ra or sa agent backgrounds. Left panel: ra Right panel: sa