Table of Contents
Fetching ...

Fractional stochastic model of citation dynamics with memory and volatility

Keisuke Okamura

TL;DR

This work proposes a fractional stochastic model for citation dynamics in which latent attention $X(t)$ evolves via a memoryful SDE driven by fractional Brownian motion: $dX(t)=X(t)[\alpha(t) dt+\beta dB_{H}(t)]$, with solution $X(t)=X_{0}\exp[A(t)+\beta B_{H}(t)]$ and $C(T)\approx\int_0^T X(t)dt$. A key theoretical result is $\mathrm{SD}[R(t)]=\beta t^{H}$ for $R(t)=\ln(X(t)/X_{0})$, producing the empirical $t^{H}$ law observed in citation fluctuations and linking memory ($H$) and volatility ($\beta$) to the shape of the citation distribution. The model predicts a log-normal distribution for antipersistent regimes ($H<\tfrac{1}{2}$) and a heavy-tailed, power-law-like distribution in persistent regimes ($H> frac{1}{2}$), providing a unified explanation for the log-normal/high-citation tails seen in empirical data. Application to arXiv data yields $H\approx0.13$ (antipersistent) and moderate $\beta$, with fractal dimension $D\approx2-H\approx1.87$, supporting a fractal, memory-rich structure in attention dynamics and suggesting broad applicability to other attention-driven networks.

Abstract

Understanding the statistical laws governing citation dynamics remains a fundamental challenge in network theory and the science of science. Citation networks typically exhibit in-degree distributions well approximated by log-normal distributions yet also display power-law behaviour in the high-citation regime -- an apparent contradiction lacking a unified explanation. Here we identify a previously unrecognised phenomenon: the variance of the logarithm of citation counts per unit time follows a power law with respect to time ($t$) since publication, scaling as $t^{H}$, with $H$ constant. This discovery introduces a new challenge while simultaneously offering a crucial clue to resolving this discrepancy. We develop a stochastic model in which latent attention to publications evolves through a memory-driven process with cumulative advantage, modelled as fractional Brownian motion with Hurst parameter $H$ and volatility. We show that antipersistent fluctuations in attention ($H < 1/2$) yield log-normal citation distributions, whereas persistent attention dynamics ($H > 1/2$) favour heavy-tailed power laws, thus resolving the log-normal--power-law contradiction. Numerical simulations confirm both the $t^{H}$ law and the transition between regimes. Empirical analysis of arXiv e-prints indicates that the latent attention process is intrinsically antipersistent ($H \approx 0.13$). By linking memory effects and stochastic fluctuations in attention to broader network dynamics, our findings provide a unifying framework for understanding the evolution of collective attention in science and other attention-driven processes.

Fractional stochastic model of citation dynamics with memory and volatility

TL;DR

This work proposes a fractional stochastic model for citation dynamics in which latent attention evolves via a memoryful SDE driven by fractional Brownian motion: , with solution and . A key theoretical result is for , producing the empirical law observed in citation fluctuations and linking memory () and volatility () to the shape of the citation distribution. The model predicts a log-normal distribution for antipersistent regimes () and a heavy-tailed, power-law-like distribution in persistent regimes (), providing a unified explanation for the log-normal/high-citation tails seen in empirical data. Application to arXiv data yields (antipersistent) and moderate , with fractal dimension , supporting a fractal, memory-rich structure in attention dynamics and suggesting broad applicability to other attention-driven networks.

Abstract

Understanding the statistical laws governing citation dynamics remains a fundamental challenge in network theory and the science of science. Citation networks typically exhibit in-degree distributions well approximated by log-normal distributions yet also display power-law behaviour in the high-citation regime -- an apparent contradiction lacking a unified explanation. Here we identify a previously unrecognised phenomenon: the variance of the logarithm of citation counts per unit time follows a power law with respect to time () since publication, scaling as , with constant. This discovery introduces a new challenge while simultaneously offering a crucial clue to resolving this discrepancy. We develop a stochastic model in which latent attention to publications evolves through a memory-driven process with cumulative advantage, modelled as fractional Brownian motion with Hurst parameter and volatility. We show that antipersistent fluctuations in attention () yield log-normal citation distributions, whereas persistent attention dynamics () favour heavy-tailed power laws, thus resolving the log-normal--power-law contradiction. Numerical simulations confirm both the law and the transition between regimes. Empirical analysis of arXiv e-prints indicates that the latent attention process is intrinsically antipersistent (). By linking memory effects and stochastic fluctuations in attention to broader network dynamics, our findings provide a unifying framework for understanding the evolution of collective attention in science and other attention-driven processes.

Paper Structure

This paper contains 27 sections, 22 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The relationship between the standard deviation (SD) of the logarithm of annual citation counts ($c$) and the number of years since publication ($t$). The data are based on Ref. Okamura21, focusing on e-prints published on arXiv in 2001 ($N=39{,}106$).
  • Figure 2: Simulation results of the attention curve $\hat{X}(t)$ for an antipersistent system ($H=0.15$). The parameters in $\hat{\alpha}(t)$ are set to $\theta=0.48$ and $\omega=0.8$. The volatility values are (a) $\beta=0.2$, (b) $\beta=0.8$ and (c) $\beta=1.5$.
  • Figure 3: Monte Carlo simulation results for the distribution of $\hat{C}(T)$, generated using the same parameter settings as in Fig. \ref{['fig:antipers_path']}, with $N_\mathrm{s}=50{,}000$ trials. The volatility values are (a) $\beta=0.2$, (b) $\beta=0.8$ and (c) $\beta=1.5$.
  • Figure 4: (a) Kernel density estimation of the distribution of $\ln\hat{C}(T)$ using the same data as in Fig. \ref{['fig:antipers_histo']}, overlaid with different values of volatility ($\beta$) for visual clarity. (b) Q--Q plot of the same $\ln\hat{C}(T)$ distribution against theoretical normal quantiles.
  • Figure 5: Scatter plot of $(\tau_{k},\hat{c}_{k})$ (filled markers) and $(\tau_{k},\hat{X}(\tau_{k}))$ (unfilled markers) from the same data as in Fig. \ref{['fig:antipers_histo']}, for $k=1\text{--}10$, along with their linear regression fits. In both cases, the error bars are negligibly small and are therefore omitted for clarity.
  • ...and 6 more figures