Table of Contents
Fetching ...

Leveraging GANs for citation intent classification and its impact on citation network analysis

Davi A. Bezerra, Filipi N. Silva, Diego R. Amancio

TL;DR

This work tackles citation-intent analysis by adopting a semi-supervised GAN-BERT framework augmented with SciBERT, enabling robust classifcation with limited labeled data. It demonstrates competitive performance on standard benchmarks (e.g., a $F_1$ of $88.74$ on SciCite and $81.75$ on ACL) while substantially reducing model size relative to state-of-the-art, enhancing practicality for large-scale corpora. Beyond classification, the paper couples intent detection with network analysis to show that filtering citations by intent can dramatically reshape centrality-based rankings and network structure, highlighting potential biases in traditional bibliometrics. The findings suggest that intent-aware analysis can refine impact measurements and inform more nuanced interpretations of scholarly influence, with implications for research evaluation and bias monitoring.

Abstract

Citations play a fundamental role in the scientific ecosystem, serving as a foundation for tracking the flow of knowledge, acknowledging prior work, and assessing scholarly influence. In scientometrics, they are also central to the construction of quantitative indicators. Not all citations, however, serve the same function: some provide background, others introduce methods, or compare results. Therefore, understanding citation intent allows for a more nuanced interpretation of scientific impact. In this paper, we adopted a GAN-based method to classify citation intents. Our results revealed that the proposed method achieves competitive classification performance, closely matching state-of-the-art results with substantially fewer parameters. This demonstrates the effectiveness and efficiency of leveraging GAN architectures combined with contextual embeddings in intent classification task. We also investigated whether filtering citation intents affects the centrality of papers in citation networks. Analyzing the network constructed from the unArXiv dataset, we found that paper rankings can be significantly influenced by citation intent. All four centrality metrics examined- degree, PageRank, closeness, and betweenness - were sensitive to the filtering of citation types. The betweenness centrality displayed the greatest sensitivity, showing substantial changes in ranking when specific citation intents were removed.

Leveraging GANs for citation intent classification and its impact on citation network analysis

TL;DR

This work tackles citation-intent analysis by adopting a semi-supervised GAN-BERT framework augmented with SciBERT, enabling robust classifcation with limited labeled data. It demonstrates competitive performance on standard benchmarks (e.g., a of on SciCite and on ACL) while substantially reducing model size relative to state-of-the-art, enhancing practicality for large-scale corpora. Beyond classification, the paper couples intent detection with network analysis to show that filtering citations by intent can dramatically reshape centrality-based rankings and network structure, highlighting potential biases in traditional bibliometrics. The findings suggest that intent-aware analysis can refine impact measurements and inform more nuanced interpretations of scholarly influence, with implications for research evaluation and bias monitoring.

Abstract

Citations play a fundamental role in the scientific ecosystem, serving as a foundation for tracking the flow of knowledge, acknowledging prior work, and assessing scholarly influence. In scientometrics, they are also central to the construction of quantitative indicators. Not all citations, however, serve the same function: some provide background, others introduce methods, or compare results. Therefore, understanding citation intent allows for a more nuanced interpretation of scientific impact. In this paper, we adopted a GAN-based method to classify citation intents. Our results revealed that the proposed method achieves competitive classification performance, closely matching state-of-the-art results with substantially fewer parameters. This demonstrates the effectiveness and efficiency of leveraging GAN architectures combined with contextual embeddings in intent classification task. We also investigated whether filtering citation intents affects the centrality of papers in citation networks. Analyzing the network constructed from the unArXiv dataset, we found that paper rankings can be significantly influenced by citation intent. All four centrality metrics examined- degree, PageRank, closeness, and betweenness - were sensitive to the filtering of citation types. The betweenness centrality displayed the greatest sensitivity, showing substantial changes in ranking when specific citation intents were removed.

Paper Structure

This paper contains 14 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Distribution of articles across disciplines in the unarXiv dataset. The dataset demonstrates a significant imbalance, with computer science, physics, and mathematics accounting for 47%, 26%, and 17% of the total articles, respectively.
  • Figure 2: Architecture of cGAN-SciBERT. The model integrates SciBERT with an SS-GAN framework, comprising a generator ($G_c$) and a discriminator ($D$). The generator produces synthetic examples from noise and a class-specific vector, while the discriminator classifies real examples and detects fake ones.
  • Figure 3: Confusion matrix for SS-cGAN + SciBERT on the ACL dataset.
  • Figure 4: t-SNE visualization of predicted citation intents for the SciCite dataset after dimensionality reduction. Each point represents a citation context, with colors indicating different citation intents.
  • Figure 5: t-SNE visualization of ground truth citation intents for the SciCite dataset after dimensionality reduction. Each point represents a citation context, with colors indicating different citation intents.
  • ...and 5 more figures