Relational hyperevent models for the coevolution of coauthoring and citation networks
Jürgen Lerner, Marian-Gabriel Hâncean, Alessandro Lomi
TL;DR
This work extends Relational Hyperevent Models to dynamic, mixed two-mode bibliographic networks, enabling joint modeling of coauthoring and citation events where each publication can involve multiple authors and multiple references. By introducing a rich set of history-dependent hyperedge covariates and estimating a stratified CoxPH-like partial likelihood with nested case-control sampling, the authors test higher-order dependencies and interactions between authors and references. Empirically, using a large Aminer-based dataset, they find strong mixed-mode effects and a pronounced tendency for subsets of papers to be cited together, suggesting endogenous 'citation packages' influence scientific impact. The results demonstrate that accounting for polyadic and interdependent publication processes enhances understanding of science production and impact, and the open-source tools used support reproducible analyses across large bibliographic datasets.
Abstract
The development of suitable statistical models for the analysis of bibliographic networks has trailed behind the empirical ambitions expressed by recent studies of science of science. Extant research typically restricts the analytical focus to either paper citation networks, or author collaboration networks. These networks involve not only direct relationships between papers or authors, but also a broader system of dependencies between the references of papers connected through multiple simultaneous citation links. In this work, we extend recently developed relational hyperevent models (RHEM) to analyze scientific networks - systems of scientific publications connected by citations and authorship. We introduce new covariates that represent theoretically relevant and empirically meaningful sub-network configurations. The new model specification supports testing of hypotheses that align with the polyadic nature of scientific publication events and the multiple interdependencies between authors and references of current and prior papers. We implement the model using open-source software to analyze a large, publicly available scientific network dataset. A significant finding of the study is the tendency for subsets of papers to be repeatedly cited together across publications. This result is crucial as it suggests that the papers' impact may be partly due to endogenous network processes. More broadly, the study shows that models accounting for both the hyperedge structure of publication events and the interconnections between authors and references significantly enhance our understanding of the network mechanisms that drive scientific production, productivity, and impact.
