Table of Contents
Fetching ...

Bot or Human? Detecting ChatGPT Imposters with A Single Question

Hong Wang, Xuan Luo, Weizhi Wang, Xifeng Yan

TL;DR

The paper tackles online differentiation between humans and LLM-based bots in conversational settings by proposing FLAIR, a single-question framework that leverages two task families: human-easy/bot-hard and bot-easy/human-hard. It systematically investigates a suite of targeted tasks (counting, substitution, random editing, searching, ASCII art reasoning, memorization, and computation) to reveal where humans and LLMs diverge in capability. Through extensive experiments across multiple datasets and models, the authors show that humans excel on weakness tasks while LLMs excel on memorization and computation, with larger models and tool-assisted approaches narrowing gaps on some tasks; ASCII-art reasoning remains a challenging frontier. The findings support online deployment of FLAIR as a robust complement or alternative to CAPTCHAs, highlight practical considerations for defense against conversational bots, and point to future directions such as multimodal capabilities and few-shot reasoning challenges.

Abstract

Large language models (LLMs) like GPT-4 have recently demonstrated impressive capabilities in natural language understanding and generation. However, there is a concern that they can be misused for malicious purposes, such as fraud or denial-of-service attacks. Therefore, it is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human. In this paper, we propose a framework named FLAIR, Finding Large Language Model Authenticity via a Single Inquiry and Response, to detect conversational bots in an online manner. Specifically, we target a single question scenario that can effectively differentiate human users from bots. The questions are divided into two categories: those that are easy for humans but difficult for bots (e.g., counting, substitution, searching, and ASCII art reasoning), and those that are easy for bots but difficult for humans (e.g., memorization and computation). Our approach shows different strengths of these questions in their effectiveness, providing a new way for online service providers to protect themselves against nefarious activities. Our code and question set are available at https://github.com/hongwang600/FLAIR.

Bot or Human? Detecting ChatGPT Imposters with A Single Question

TL;DR

The paper tackles online differentiation between humans and LLM-based bots in conversational settings by proposing FLAIR, a single-question framework that leverages two task families: human-easy/bot-hard and bot-easy/human-hard. It systematically investigates a suite of targeted tasks (counting, substitution, random editing, searching, ASCII art reasoning, memorization, and computation) to reveal where humans and LLMs diverge in capability. Through extensive experiments across multiple datasets and models, the authors show that humans excel on weakness tasks while LLMs excel on memorization and computation, with larger models and tool-assisted approaches narrowing gaps on some tasks; ASCII-art reasoning remains a challenging frontier. The findings support online deployment of FLAIR as a robust complement or alternative to CAPTCHAs, highlight practical considerations for defense against conversational bots, and point to future directions such as multimodal capabilities and few-shot reasoning challenges.

Abstract

Large language models (LLMs) like GPT-4 have recently demonstrated impressive capabilities in natural language understanding and generation. However, there is a concern that they can be misused for malicious purposes, such as fraud or denial-of-service attacks. Therefore, it is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human. In this paper, we propose a framework named FLAIR, Finding Large Language Model Authenticity via a Single Inquiry and Response, to detect conversational bots in an online manner. Specifically, we target a single question scenario that can effectively differentiate human users from bots. The questions are divided into two categories: those that are easy for humans but difficult for bots (e.g., counting, substitution, searching, and ASCII art reasoning), and those that are easy for bots but difficult for humans (e.g., memorization and computation). Our approach shows different strengths of these questions in their effectiveness, providing a new way for online service providers to protect themselves against nefarious activities. Our code and question set are available at https://github.com/hongwang600/FLAIR.
Paper Structure (26 sections, 1 figure, 2 tables)

This paper contains 26 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Example about ASCII reasoning. (a) select the ascii arts containing X. (b) Rotate the ASCII art to the appropriate orientation. (c) Select the one that most accurately aligns with the cropped portion.