Table of Contents
Fetching ...

Detecting The Corruption Of Online Questionnaires By Artificial Intelligence

Benjamin Lebrun, Sharon Temtsin, Andrew Vonasch, Christoph Bartneck

TL;DR

The study investigates the threat posed by AI-generated text to data quality in online questionnaires using crowd-sourcing. Employing a within-subjects design, it compares human- and AI-authored justifications (via ChatGPT and Undetectable.AI) across readability, perceived quality, and an imitation-game task, complemented by automated detectors. Humans detected AI authorship at about 76% accuracy, while automatic detectors performed near chance, and obfuscation tools further degraded detector reliability. The findings highlight that platform-level safeguards are essential for maintaining data integrity in online research, given the limitations of both human reviewers and current AI-detection systems. The work underscores the replication crisis context and calls for systemic solutions beyond individual-level checks to mitigate AI-assisted fraud in crowdsourced studies.

Abstract

Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.

Detecting The Corruption Of Online Questionnaires By Artificial Intelligence

TL;DR

The study investigates the threat posed by AI-generated text to data quality in online questionnaires using crowd-sourcing. Employing a within-subjects design, it compares human- and AI-authored justifications (via ChatGPT and Undetectable.AI) across readability, perceived quality, and an imitation-game task, complemented by automated detectors. Humans detected AI authorship at about 76% accuracy, while automatic detectors performed near chance, and obfuscation tools further degraded detector reliability. The findings highlight that platform-level safeguards are essential for maintaining data integrity in online research, given the limitations of both human reviewers and current AI-detection systems. The work underscores the replication crisis context and calls for systemic solutions beyond individual-level checks to mitigate AI-assisted fraud in crowdsourced studies.

Abstract

Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.
Paper Structure (47 sections, 7 equations, 8 figures, 15 tables)

This paper contains 47 sections, 7 equations, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Distribution of response length from source experiment
  • Figure 2: Questionnaire to measure the quality of the text.
  • Figure 3: Questionnaire to identify the author of the text.
  • Figure 4: The perceived quality for both AI-generated and human-generated sentences.
  • Figure 5: Proportion of participants judging that each sentence was written by AI (as opposed to by a human). Each pair of dots represents the pair of sentences created from the same prompt.
  • ...and 3 more figures