Table of Contents
Fetching ...

Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement

Zhikai Xue, Guoxiu He, Zhuoren Jiang, Sichen Gu, Yangyang Kang, Star Zhao, Wei Lu

TL;DR

A novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values is introduced, which encodes temporal and structural features within dynamic heterogeneous graphs derived from the citation networks and applies various auxiliary tasks for disentanglement.

Abstract

The scientific impact of academic papers is influenced by intricate factors such as dynamic popularity and inherent contribution. Existing models typically rely on static graphs for citation count estimation, failing to differentiate among its sources. In contrast, we propose distinguishing effects derived from various factors and predicting citation increments as estimated potential impacts within the dynamic context. In this research, we introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values. It encodes temporal and structural features within dynamic heterogeneous graphs derived from the citation networks and applies various auxiliary tasks for disentanglement. By emphasizing comparative and co-cited/citing information and aggregating snapshots evolutionarily, DPPDCC captures knowledge flow within the citation network. Afterwards, popularity is outlined by contrasting augmented graphs to extract the essence of citation diffusion and predicting citation accumulation bins for quantitative conformity modeling. Orthogonal constraints ensure distinct modeling of each perspective, preserving the contribution value. To gauge generalization across publication times and replicate the realistic dynamic context, we partition data based on specific time points and retain all samples without strict filtering. Extensive experiments on three datasets validate DPPDCC's superiority over baselines for papers published previously, freshly, and immediately, with further analyses confirming its robustness. Our codes and supplementary materials can be found at https://github.com/ECNU-Text-Computing/DPPDCC.

Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement

TL;DR

A novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values is introduced, which encodes temporal and structural features within dynamic heterogeneous graphs derived from the citation networks and applies various auxiliary tasks for disentanglement.

Abstract

The scientific impact of academic papers is influenced by intricate factors such as dynamic popularity and inherent contribution. Existing models typically rely on static graphs for citation count estimation, failing to differentiate among its sources. In contrast, we propose distinguishing effects derived from various factors and predicting citation increments as estimated potential impacts within the dynamic context. In this research, we introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values. It encodes temporal and structural features within dynamic heterogeneous graphs derived from the citation networks and applies various auxiliary tasks for disentanglement. By emphasizing comparative and co-cited/citing information and aggregating snapshots evolutionarily, DPPDCC captures knowledge flow within the citation network. Afterwards, popularity is outlined by contrasting augmented graphs to extract the essence of citation diffusion and predicting citation accumulation bins for quantitative conformity modeling. Orthogonal constraints ensure distinct modeling of each perspective, preserving the contribution value. To gauge generalization across publication times and replicate the realistic dynamic context, we partition data based on specific time points and retain all samples without strict filtering. Extensive experiments on three datasets validate DPPDCC's superiority over baselines for papers published previously, freshly, and immediately, with further analyses confirming its robustness. Our codes and supplementary materials can be found at https://github.com/ECNU-Text-Computing/DPPDCC.
Paper Structure (50 sections, 15 equations, 10 figures, 10 tables)

This paper contains 50 sections, 15 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: (a) We disentangle the citation increment into diffusion, conformity, and contribution values for better interpretability. (b) To imitate the evaluation of future dynamics within the citation context, we split the dataset by the observation time point and categorize samples into previous, fresh, and immediate papers in terms of publication time. (c) We represent the heterogeneous subgraph schema with all metadata, where each relation type is bidirectional.
  • Figure 2: The overall architecture of DPPDCC. DPPDCC first encodes the Dynamic Heterogeneous Graph of the target paper with the Citation-aware GNN Encoder. Based on the encoded representation of the target paper, DPPDCC then disentangles its citation increment into diffusion, conformity, and contribution values through three corresponding auxiliary tasks.
  • Figure 3: Results of hyper-parameters test in terms of MALE.
  • Figure 4: Visualization of disentangled value proportions in CS. (a) displays the trend that evolved with the publication time. (b) demonstrates the detailed composition of papers categorized into previous, fresh, and immediate ones. (c) is binning the samples based on the predicted values.
  • Figure 5: Results of T-SNE visualization for representations from conformity perspective in CS. Various colors correspond to the binning labels assigned to the samples.
  • ...and 5 more figures