Detecting The Corruption Of Online Questionnaires By Artificial Intelligence
Benjamin Lebrun, Sharon Temtsin, Andrew Vonasch, Christoph Bartneck
TL;DR
The study investigates the threat posed by AI-generated text to data quality in online questionnaires using crowd-sourcing. Employing a within-subjects design, it compares human- and AI-authored justifications (via ChatGPT and Undetectable.AI) across readability, perceived quality, and an imitation-game task, complemented by automated detectors. Humans detected AI authorship at about 76% accuracy, while automatic detectors performed near chance, and obfuscation tools further degraded detector reliability. The findings highlight that platform-level safeguards are essential for maintaining data integrity in online research, given the limitations of both human reviewers and current AI-detection systems. The work underscores the replication crisis context and calls for systemic solutions beyond individual-level checks to mitigate AI-assisted fraud in crowdsourced studies.
Abstract
Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.
