Table of Contents
Fetching ...

Is Crowdsourcing a Puppet Show? Detecting a New Type of Fraud in Online Platforms

Shengqian Wang, Israt Jahan Jui, Julie Thorpe

TL;DR

This paper uncovers a significant puppeteer threat on MTurk, showing that a substantial fraction of worker accounts are controlled by single operators who bypass attention checks through multiple puppets. Analyzing two studies (N=558 and N=698) reveals puppet prevalence of about 34.6% and 56.5%, with evidence suggesting human-driven interactions rather than Bots in at least one study. The authors propose a layered defense combining classic attention checks, dynamic questioning, implicit learning tests, behavioral analytics, and device fingerprinting, while highlighting economic and ethical considerations and the platform’s role in maintaining data quality. The work warns that prior findings may be biased by puppet noise and urges broader validation across platforms, along with community guidelines to mitigate this emerging form of crowdsourcing fraud.

Abstract

Crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) are important tools for researchers seeking to conduct studies with a broad, global participant base. Despite their popularity and demonstrated utility, we present evidence that suggests the integrity of data collected through Amazon MTurk is being threatened by the presence of puppeteers, apparently human workers controlling multiple puppet accounts that are capable of bypassing standard attention checks. If left undetected, puppeteers and their puppets can undermine the integrity of data collected on these platforms. This paper investigates data from two Amazon MTurk studies, finding that a substantial proportion of accounts (33% to 56.4%) are likely puppets. Our findings highlight the importance of adopting multifaceted strategies to ensure data integrity on crowdsourcing platforms. With the goal of detecting this type of fraud, we discuss a set of potential countermeasures for both puppets and bots with varying degrees of sophistication (e.g., employing AI). The problem of single entities (or puppeteers) manually controlling multiple accounts could exist on other crowdsourcing platforms; as such, their detection may be of broader application. While our findings suggest the need to re-evaluate the quality of crowdsourced data, many previous studies likely remain valid, particularly those with robust experimental designs. However, the presence of puppets may have contributed to false null results in some studies, suggesting that unpublished work may be worth revisiting with effective puppet detection strategies.

Is Crowdsourcing a Puppet Show? Detecting a New Type of Fraud in Online Platforms

TL;DR

This paper uncovers a significant puppeteer threat on MTurk, showing that a substantial fraction of worker accounts are controlled by single operators who bypass attention checks through multiple puppets. Analyzing two studies (N=558 and N=698) reveals puppet prevalence of about 34.6% and 56.5%, with evidence suggesting human-driven interactions rather than Bots in at least one study. The authors propose a layered defense combining classic attention checks, dynamic questioning, implicit learning tests, behavioral analytics, and device fingerprinting, while highlighting economic and ethical considerations and the platform’s role in maintaining data quality. The work warns that prior findings may be biased by puppet noise and urges broader validation across platforms, along with community guidelines to mitigate this emerging form of crowdsourcing fraud.

Abstract

Crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) are important tools for researchers seeking to conduct studies with a broad, global participant base. Despite their popularity and demonstrated utility, we present evidence that suggests the integrity of data collected through Amazon MTurk is being threatened by the presence of puppeteers, apparently human workers controlling multiple puppet accounts that are capable of bypassing standard attention checks. If left undetected, puppeteers and their puppets can undermine the integrity of data collected on these platforms. This paper investigates data from two Amazon MTurk studies, finding that a substantial proportion of accounts (33% to 56.4%) are likely puppets. Our findings highlight the importance of adopting multifaceted strategies to ensure data integrity on crowdsourcing platforms. With the goal of detecting this type of fraud, we discuss a set of potential countermeasures for both puppets and bots with varying degrees of sophistication (e.g., employing AI). The problem of single entities (or puppeteers) manually controlling multiple accounts could exist on other crowdsourcing platforms; as such, their detection may be of broader application. While our findings suggest the need to re-evaluate the quality of crowdsourced data, many previous studies likely remain valid, particularly those with robust experimental designs. However, the presence of puppets may have contributed to false null results in some studies, suggesting that unpublished work may be worth revisiting with effective puppet detection strategies.

Paper Structure

This paper contains 36 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A screenshot for an advertisement related to rental Amazon MTurk accounts on Facebook, (captured on April 26, 2024.)
  • Figure 2: Screenshots for Facebook posts related to Amazon MTurk trading in public groups, (captured on April 25, 2024).
  • Figure 3: Simple example of a dynamic multiple choice question. Text in brackets are dynamic words inserted on the fly from an online or local database. The dynamic text will not show different font styles to make them unattractive.