Table of Contents
Fetching ...

Hidden Citations Obscure True Impact in Science

Xiangyi Meng, Onur Varol, Albert-László Barabási

TL;DR

It is shown that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline, and that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts.

Abstract

References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.

Hidden Citations Obscure True Impact in Science

TL;DR

It is shown that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline, and that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts.

Abstract

References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
Paper Structure (4 sections, 2 equations, 5 figures)

This paper contains 4 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Hidden citations.(a) A foundational paper is a manuscript that introduces a new concept that subsequently defines a topic of inquiry by the scientific community, such as the topic "anti-de Sitter/conformal field theory," also known as "AdS/CFT ads-cft_m99." Papers focusing on the topic mention the catchphrase "AdS/CFT" or "anti-de Sitter/conformal field theory," followed by a citation to one of the foundational papers. Often, however, the catchphrases are present without explicit citations, resulting in hidden citations. (b) Exemplary topics selected from high energy physics (hep), condensed matter physics (cond), quantum physics (quant), and astrophysics (astro), together with their corresponding catchphrase(s) (lemmatized as word stems) and foundational paper(s) (Microsoft Academic Graph id). Darker arrows denote the algorithm's higher statistical confidence for the respective foundational paper. (c-f) Time evolution of citations and hidden citations for the topics listed in (b). The arrows denote the publication date(s) of the foundational paper(s) for each topic.
  • Figure 2: Factors that drive hidden citations.(a) The temporal change of $p(\text{cite}|\text{mention})$, the probability that a paper mentioning the topic-specific catchphrases will also cite the foundational paper, as a function of time (years since publication). On average, $p(\text{cite}|\text{mention})$ per topic drops by approximately $20\%$ after $20$ years of publication of the first foundational paper. Error bars represent $95\%$ confidence intervals. (b) Topics with more citations ($c$) tend to have more hidden citations ($h$) (with Spearman's rank correlation $\rho\approx 0.381$ and null hypothesis $H_0$ rejected). Most topics fall into the $95\%$ single-observation confidence bands with a log-log slope $0.763{\pm0.208}$, indicating that $h\sim c^{0.763}$. (c)$p(\text{cite}|\text{mention})$ as a function of citations per topic ($\rho\approx 0.016$, $H_0$ not rejected), indicating that the probability of a textual reference becoming a hidden citation is not driven by the number of citations to the topic. (d)$p(\text{cite}|\text{mention})$ as a function of mentions per topic ($\rho\approx -0.611$, $H_0$ rejected). The strong negative correlation indicates that hidden citations are driven by the number of textual mentions of the topic. Most topics fall into the $95\%$ confidence bands with a log-linear slope $-0.27{\pm0.04}$. The pattern holds for four distinct publication venues (e-h).
  • Figure 3: Credit redirected.(a) The most cited alternatives for four topics that acquire hidden citations, primarily indicating that credit is often diverted to books, reviews or applications/extensions of the foundational papers. (b) Most alternatives to hidden citations are related to the foundational paper, detectable by tracking the citation path between the alternative and the foundational paper. (c-f) Fraction of hidden citations ranked by their citation hierarchy to the foundational papers. For each topic (except "BOSS"), around $60\%$ of hidden citations (green, top) cited other arXiv papers that explicitly cited the foundational papers. For a randomly sampled reference from the full arXiv, this fraction is negligible (brown, bottom).
  • Figure 4: Foundational papers.(a) Changes in the citation-based ranks of the top-ranked foundational papers after taking hidden citations into account, shown by arrows from the old explicit-citation-based rank to the new explicit-plus-hidden-citation-based rank (green: rank rise; red: rank drop). After accounting for hidden citations, the "cosmological inflation theory" paper (2134251287), ranked #8 based on explicit citation counts, takes the top spot. (b) For foundational papers with similar numbers of explicit citations, the paper with more hidden citations tends to result in higher average author saliency (inset). The proportion of papers in the corpus that can acquire hidden citations increases with (c) the explicit citations but not with (d) the publication year of the papers. Error bars represent $95\%$ confidence intervals. (e) Distribution of foundational papers by the number of authors per foundational paper, shown for all catchphrases (black) and for eponym-related (blue) and experiment-related catchphrases (green).
  • Figure 5: Hidden citations across disciplines and venues.(a) Four topics selected from computer science (cs) and biology (bio) [cf. Fig. \ref{['fig_diagram']}(b)]. (b-i) Time evolution of citations and hidden citations [cf. Fig. \ref{['fig_diagram']}(c-f)] for the four topics shown in (a), identified from arXiv (b-e) and Nature(f-i).