Table of Contents
Fetching ...

Google Scholar is manipulatable

Hazem Ibrahim, Fengyuan Liu, Yasir Zaki, Talal Rahwan

TL;DR

A dataset of ~1.6 million profiles on Google Scholar is compiled to examine instances of citation fraud on the platform and provides conclusive evidence that citations can be bought in bulk, and highlights the need to look beyond citation counts.

Abstract

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.

Google Scholar is manipulatable

TL;DR

A dataset of ~1.6 million profiles on Google Scholar is compiled to examine instances of citation fraud on the platform and provides conclusive evidence that citations can be bought in bulk, and highlights the need to look beyond citation counts.

Abstract

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.
Paper Structure (20 sections, 4 figures)

This paper contains 20 sections, 4 figures.

Figures (4)

  • Figure 1: Survey responses from faculty of the top-10 ranked universities around the world.A, The percentage of faculty who consider citations when evaluating candidates (blue) and those who do not (red). B, Solid bars indicate, out of those who self-report considering citations when evaluating candidates, the percentage of faculty using each database as the primary source of citation metrics. Hatched bars indicate, out of those who report that their colleagues consider citations when evaluating candidates, the percentage of colleagues using each database. C, Relative to the Natural Sciences, the percentage of faculty from each group of disciplines who consider citations when evaluating candidates. D, Relative to the Natural Sciences, the percentage of faculty who use Google Scholar as the primary source of citation metrics. In (C) and (D), dots denote OLS-estimated coefficients and error bars represent 95% confidence intervals.
  • Figure 2: A comparative analysis of suspicious authors and their matches. In each plot, red lines and red dots denote suspicious authors, while blue ones denote their matches. A, For the 4 years leading up to an author's peak citations, the annual number of citations relative to the peak. B, Discrepancy between Google Scholar and Scopus in terms of the author's citation count in their peak year. In (C)-(H), for any given author, we focus on their 10 "citing papers", i.e., the ones that reference them the most. C, The number of references to a given author in each of their 10 citing papers. D, The total number of references pointing to an author from their 10 citing papers, divided by the number of the author's unique papers being cited therein. E, The citation network between the 10 citing papers ($\star$) and an author's papers ($\circ$) for a suspicious author (red) and their matching author (blue). F, For each value $v$ on the x-axis, how many of the citing papers have $v\%$ of their references pointing to the author in question. G, For each value $v$ on the x-axis, how many of the citing papers have an average of $v$ references per page pointing to the author in question. H, For each value $v$ on the x-axis, how many of the citing papers have $v\%$ of their references pointing to the author in question and are not referenced in the citing papers' main text despite being listed in its bibliography.
  • Figure 3: The distribution of $c^2$-index.A, A sample of 900 authors randomly selected from the nine disciplines with the largest number of authors in our dataset (100 per discipline). Triangles highlight the five suspicious authors whose Google Scholar profiles exhibit highly irregular patterns. B, Distribution of all scientists in Microsoft Academic Graph (MAG) who are cited at least 100 times. For a scientist whose $c^2$-index is $n$ ($y$-axis), the $x$-axis shows the percentage of citations a scientist receives from papers citing them at least $n$ times. Inset shows the cumulative distribution of $c^2$-index of all scientists (blue), Nobel laureates (purple), and top-10 most cited scientists in the following three fields on Google Scholar (orange): machine learning, neuroscience, and bio-informatics. Red triangles highlight scientists who were previously reported to have engaged in academic misconduct.
  • Figure 4: The citation matching network of the journal that provided purchased citations. Nodes denote papers published in this journal, and edges link two papers which share at least one reference in their bibliographies. The edge width reflects the number of shared references that appear in both papers. The largest circle contains the 30 largest connected components in the network. The five smaller circles zoom into the clusters containing papers that include a suspiciously high number of citations to certain authors. Details regarding these authors are listed next to each circle.