Table of Contents
Fetching ...

Identifying Fabricated Networks within Authorship-for-Sale Enterprises

Simon J. Porter, Leslie D. McIntosh

TL;DR

A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks and methods to limit the expansion and propagation of these networks are discussed both in technological and social terms.

Abstract

Fabricated papers do not just need text, images, and data, they also require a fabricated or partially fabricated network of authors. Most `authors' on a fabricated paper have not been associated with the research, but rather are added through a transaction. This lack of deeper connection means that there is a low likelihood that co-authors on fabricated papers will ever appear together on the same paper more than once. This paper constructs a model that encodes some of the key characteristics of this activity in an `authorship-for-sale' network with the aim to create a robust method to detect this type of activity. A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks. The model suggested in this paper detects networks that have a statistically significant overlap with other approaches that principally rely on textual analysis for the detection of fraudulent papers. Researchers connected to networks identified using the methodology outlined in this paper are shown to be connected with 37% of papers identified through the tortured-phrase and clay-feet methods deployed in the Problematic Paper Screener website. Finally, methods to limit the expansion and propagation of these networks is discussed both in technological and social terms.

Identifying Fabricated Networks within Authorship-for-Sale Enterprises

TL;DR

A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks and methods to limit the expansion and propagation of these networks are discussed both in technological and social terms.

Abstract

Fabricated papers do not just need text, images, and data, they also require a fabricated or partially fabricated network of authors. Most `authors' on a fabricated paper have not been associated with the research, but rather are added through a transaction. This lack of deeper connection means that there is a low likelihood that co-authors on fabricated papers will ever appear together on the same paper more than once. This paper constructs a model that encodes some of the key characteristics of this activity in an `authorship-for-sale' network with the aim to create a robust method to detect this type of activity. A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks. The model suggested in this paper detects networks that have a statistically significant overlap with other approaches that principally rely on textual analysis for the detection of fraudulent papers. Researchers connected to networks identified using the methodology outlined in this paper are shown to be connected with 37% of papers identified through the tortured-phrase and clay-feet methods deployed in the Problematic Paper Screener website. Finally, methods to limit the expansion and propagation of these networks is discussed both in technological and social terms.
Paper Structure (28 sections, 7 figures, 6 tables)

This paper contains 28 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Histogram of Stage I and Stage II researchers with network shapes that have been been repeated fewer than 10 times in 2022. Researchers are distributed across the $x$-axis by their clustering coefficient. The middle green histogram represents the subset of researchers that have produced greater than 20 publications in the year. In the bottom red subset, the most frequent collaborator is a Stage I-III researcher, and greater that 50% of the network is made up of Stage I-II researchers.
  • Figure 2: Development of the suspicious author cohort as a percentage of all Stage I and II researchers by year. The two areas are overlapping and not cumulative. The area between the top of the light blue region and the $x$-axis represents all suspicious Stage I and II researchers as a percentage of all Stage I and Stage II researchers. The dark blue area shows just those that are part of the largest connected component of the suspicious co-author network defined by our model.
  • Figure 3: Histogram of the graph density of the co authorship networks of random samples of researchers with clustering coefficients of less than 0.4 and greater than 20 publications. The sample size of each measurement is the same size as the suspicious author sample
  • Figure 4: Proportion of papers exhibiting tortured phases as a percentage of all journal articles produced in a year. The tortured-phases dataset was extracted from the Problematic Paper Screener. The drop in papers in 2022 might be explained by the implementation of the tortured phrases detection strategies into submission processing workflows, or a reaction to detection by paper mills.
  • Figure 5: Publications by volume that have an author in the suspicious author set
  • ...and 2 more figures