Table of Contents
Fetching ...

Relational hyperevent models for the coevolution of coauthoring and citation networks

Jürgen Lerner, Marian-Gabriel Hâncean, Alessandro Lomi

TL;DR

This work extends Relational Hyperevent Models to dynamic, mixed two-mode bibliographic networks, enabling joint modeling of coauthoring and citation events where each publication can involve multiple authors and multiple references. By introducing a rich set of history-dependent hyperedge covariates and estimating a stratified CoxPH-like partial likelihood with nested case-control sampling, the authors test higher-order dependencies and interactions between authors and references. Empirically, using a large Aminer-based dataset, they find strong mixed-mode effects and a pronounced tendency for subsets of papers to be cited together, suggesting endogenous 'citation packages' influence scientific impact. The results demonstrate that accounting for polyadic and interdependent publication processes enhances understanding of science production and impact, and the open-source tools used support reproducible analyses across large bibliographic datasets.

Abstract

The development of suitable statistical models for the analysis of bibliographic networks has trailed behind the empirical ambitions expressed by recent studies of science of science. Extant research typically restricts the analytical focus to either paper citation networks, or author collaboration networks. These networks involve not only direct relationships between papers or authors, but also a broader system of dependencies between the references of papers connected through multiple simultaneous citation links. In this work, we extend recently developed relational hyperevent models (RHEM) to analyze scientific networks - systems of scientific publications connected by citations and authorship. We introduce new covariates that represent theoretically relevant and empirically meaningful sub-network configurations. The new model specification supports testing of hypotheses that align with the polyadic nature of scientific publication events and the multiple interdependencies between authors and references of current and prior papers. We implement the model using open-source software to analyze a large, publicly available scientific network dataset. A significant finding of the study is the tendency for subsets of papers to be repeatedly cited together across publications. This result is crucial as it suggests that the papers' impact may be partly due to endogenous network processes. More broadly, the study shows that models accounting for both the hyperedge structure of publication events and the interconnections between authors and references significantly enhance our understanding of the network mechanisms that drive scientific production, productivity, and impact.

Relational hyperevent models for the coevolution of coauthoring and citation networks

TL;DR

This work extends Relational Hyperevent Models to dynamic, mixed two-mode bibliographic networks, enabling joint modeling of coauthoring and citation events where each publication can involve multiple authors and multiple references. By introducing a rich set of history-dependent hyperedge covariates and estimating a stratified CoxPH-like partial likelihood with nested case-control sampling, the authors test higher-order dependencies and interactions between authors and references. Empirically, using a large Aminer-based dataset, they find strong mixed-mode effects and a pronounced tendency for subsets of papers to be cited together, suggesting endogenous 'citation packages' influence scientific impact. The results demonstrate that accounting for polyadic and interdependent publication processes enhances understanding of science production and impact, and the open-source tools used support reproducible analyses across large bibliographic datasets.

Abstract

The development of suitable statistical models for the analysis of bibliographic networks has trailed behind the empirical ambitions expressed by recent studies of science of science. Extant research typically restricts the analytical focus to either paper citation networks, or author collaboration networks. These networks involve not only direct relationships between papers or authors, but also a broader system of dependencies between the references of papers connected through multiple simultaneous citation links. In this work, we extend recently developed relational hyperevent models (RHEM) to analyze scientific networks - systems of scientific publications connected by citations and authorship. We introduce new covariates that represent theoretically relevant and empirically meaningful sub-network configurations. The new model specification supports testing of hypotheses that align with the polyadic nature of scientific publication events and the multiple interdependencies between authors and references of current and prior papers. We implement the model using open-source software to analyze a large, publicly available scientific network dataset. A significant finding of the study is the tendency for subsets of papers to be repeatedly cited together across publications. This result is crucial as it suggests that the papers' impact may be partly due to endogenous network processes. More broadly, the study shows that models accounting for both the hyperedge structure of publication events and the interconnections between authors and references significantly enhance our understanding of the network mechanisms that drive scientific production, productivity, and impact.
Paper Structure (52 sections, 72 equations, 14 figures, 2 tables)

This paper contains 52 sections, 72 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Publication event of paper $j$, authored by $i_1,\dots,i_3$ and citing papers $j_1,\dots,j_4$, followed by the publication event of paper $j'$, authored by $i_2,\dots,i_4$ and citing papers $j_2,\dots,j_6$. The two authors $i_2,i_3$ repeatedly cite the three papers $j_2,j_3,j_4$. Mixed two-mode hyperedges, containing the published paper, its authors, and its references, are displayed as gray-shaded areas enclosing the participating nodes.
  • Figure 2: Publication events of papers $j$ and $j'$ with their authors $i_1,\dots,i_4$ and references $j_1,\dots,j_6$. Additionally, authors ($i_5,i_6$) of some of the cited papers are given. Values of network attributes in this example are computed without any decay over time. Values for $cite^{(pp)}$ and $author$ are binary and are given by the lines ("edges") connecting papers to papers, or authors to papers, respectively. Values of other network attributes on selected nodes are given on the righthand side.
  • Figure 3: Histogram of publication year of papers used in the empirical analysis.
  • Figure 4: Example of two publication events $(t_1,j_{t_1},\{i_1,i_2,i_3\},\{j_1,j_2,j_3,j_4\})$ and $(t_2,j_{t_2},\{i_2,i_3,i_4\},\{j_2,j_3,j_4,j_5,j_6\})$ with $t_1<t_2$ (repeated from Fig. \ref{['fig:event']}), used to illustrate several effects defined via subset repetition (see the text for a detailed explanation).
  • Figure 5: Example of two publication events $(t_1,j_{t_1},I_1,\{j_1,j_2,j_3,j_4\})$ and $(t_2,j_{t_2},I_2,\{j_{t_1},j_3,j_4,j_5,j_6\})$ with $t_1<t_2$, illustrating the effect to cite a paper and part of its references. The author sets of these events do not matter for this covariate and are therefore left unspecified in the example.
  • ...and 9 more figures