Table of Contents
Fetching ...

Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines

Kehan Long, Shasha Li, Pancheng Wang, Chenlong Bao, Jintao Tang, Ting Wang

TL;DR

The paper tackles the problem of recommending missed citations identified by reviewers (RMC) to improve the completeness and credibility of full-paper references. It introduces the CitationR dataset, built from NeurIPS and ICLR reviews, including an extended version with 40,810 candidate papers from top venues, and provides expert-labeled missed citations as golden labels. The authors propose RMC Net, which uses an Attentive Reference Encoder to fuse content signals with citation-pattern signals and trains with a triplet-loss objective and nearby negative sampling. Empirical results show RMC is challenging yet the proposed model consistently outperforms baselines across CitationR and Extended CitationR, and the work provides data and code to spur further research in this practical, peer-review–driven task.

Abstract

Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improvement. In the scenario of peer reviewing, it is a common phenomenon that submissions are identified as missing vital citations by reviewers. This may lead to a negative impact on the credibility and validity of the research presented. To help improve citations of full papers, we first define a novel task of Recommending Missed Citations Identified by Reviewers (RMC) and construct a corresponding expert-labeled dataset called CitationR. We conduct an extensive evaluation of several state-of-the-art methods on CitationR. Furthermore, we propose a new framework RMCNet with an Attentive Reference Encoder module mining the relevance between papers, already-made citations, and missed citations. Empirical results prove that RMC is challenging, with the proposed architecture outperforming previous methods in all metrics. We release our dataset and benchmark models to motivate future research on this challenging new task.

Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines

TL;DR

The paper tackles the problem of recommending missed citations identified by reviewers (RMC) to improve the completeness and credibility of full-paper references. It introduces the CitationR dataset, built from NeurIPS and ICLR reviews, including an extended version with 40,810 candidate papers from top venues, and provides expert-labeled missed citations as golden labels. The authors propose RMC Net, which uses an Attentive Reference Encoder to fuse content signals with citation-pattern signals and trains with a triplet-loss objective and nearby negative sampling. Empirical results show RMC is challenging yet the proposed model consistently outperforms baselines across CitationR and Extended CitationR, and the work provides data and code to spur further research in this practical, peer-review–driven task.

Abstract

Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improvement. In the scenario of peer reviewing, it is a common phenomenon that submissions are identified as missing vital citations by reviewers. This may lead to a negative impact on the credibility and validity of the research presented. To help improve citations of full papers, we first define a novel task of Recommending Missed Citations Identified by Reviewers (RMC) and construct a corresponding expert-labeled dataset called CitationR. We conduct an extensive evaluation of several state-of-the-art methods on CitationR. Furthermore, we propose a new framework RMCNet with an Attentive Reference Encoder module mining the relevance between papers, already-made citations, and missed citations. Empirical results prove that RMC is challenging, with the proposed architecture outperforming previous methods in all metrics. We release our dataset and benchmark models to motivate future research on this challenging new task.
Paper Structure (28 sections, 9 equations, 6 figures, 3 tables)

This paper contains 28 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An example of missed citations identified by reviewers extracted from https://openreview.net/forum?id=R612wi_C-7w. Italic and colored texts represent papers mentioned in reviews, among which, enclosed by the dashed border, are those reviewers recommend citing. Underlined and bolded texts indicate reasons why those citations are missed and necessary.
  • Figure 2: Distribution of venues of extracted citations recommended by reviewers.
  • Figure 3: Distribution of year gaps between submissions and their citations.
  • Figure 4: The overall architecture of RMC Net, which consists of three parts: (1) Paper encoder (left) generates representations of papers with an Attentive Reference Encoder (ARE) part mining the reference sections. (2) Triplet Loss (upper middle) computes the loss fusing positive samples and negative samples of different levels. (3) Nearest Neighbors Sampling (upper right) obtains negative samples of different levels based on output embeddings and their textual similarities to the submission paper.
  • Figure 5: Results with different values of $\alpha$.
  • ...and 1 more figures