Table of Contents
Fetching ...

BounTCHA: A CAPTCHA Utilizing Boundary Identification in Guided Generative AI-extended Videos

Lehao Lin, Ke Wang, Maha Abdallah, Wei Cai

TL;DR

The paper addresses the rising capability of AI-powered bots to defeat traditional CAPTCHAs by proposing BounTCHA, a boundary-identification CAPTCHA built on guided AI-extended videos. It combines a data-generation pipeline that uses content understanding, last-frame prompts, and AI video extension to create a recognizable boundary, then tests human performance and security against random, database, and multimodal-attacks. Key contributions include (i) a practical data-generation and prototype pipeline, (ii) empirical characterization of human time biases for boundary detection, and (iii) a comprehensive security analysis showing resilience against several attacker classes. The work demonstrates that human perception of video boundaries, amplified by controlled AI extensions, can effectively separate humans from bots, offering a scalable defense for web services in the AI-enhanced era.

Abstract

In recent years, the rapid development of artificial intelligence (AI) especially multi-modal Large Language Models (MLLMs), has enabled it to understand text, images, videos, and other multimedia data, allowing AI systems to execute various tasks based on human-provided prompts. However, AI-powered bots have increasingly been able to bypass most existing CAPTCHA systems, posing significant security threats to web applications. This makes the design of new CAPTCHA mechanisms an urgent priority. We observe that humans are highly sensitive to shifts and abrupt changes in videos, while current AI systems still struggle to comprehend and respond to such situations effectively. Based on this observation, we design and implement BounTCHA, a CAPTCHA mechanism that leverages human perception of boundaries in video transitions and disruptions. By utilizing generative AI's capability to extend original videos with prompts, we introduce unexpected twists and changes to create a pipeline for generating guided short videos for CAPTCHA purposes. We develop a prototype and conduct experiments to collect data on humans' time biases in boundary identification. This data serves as a basis for distinguishing between human users and bots. Additionally, we perform a detailed security analysis of BounTCHA, demonstrating its resilience against various types of attacks. We hope that BounTCHA will act as a robust defense, safeguarding millions of web applications in the AI-driven era.

BounTCHA: A CAPTCHA Utilizing Boundary Identification in Guided Generative AI-extended Videos

TL;DR

The paper addresses the rising capability of AI-powered bots to defeat traditional CAPTCHAs by proposing BounTCHA, a boundary-identification CAPTCHA built on guided AI-extended videos. It combines a data-generation pipeline that uses content understanding, last-frame prompts, and AI video extension to create a recognizable boundary, then tests human performance and security against random, database, and multimodal-attacks. Key contributions include (i) a practical data-generation and prototype pipeline, (ii) empirical characterization of human time biases for boundary detection, and (iii) a comprehensive security analysis showing resilience against several attacker classes. The work demonstrates that human perception of video boundaries, amplified by controlled AI extensions, can effectively separate humans from bots, offering a scalable defense for web services in the AI-enhanced era.

Abstract

In recent years, the rapid development of artificial intelligence (AI) especially multi-modal Large Language Models (MLLMs), has enabled it to understand text, images, videos, and other multimedia data, allowing AI systems to execute various tasks based on human-provided prompts. However, AI-powered bots have increasingly been able to bypass most existing CAPTCHA systems, posing significant security threats to web applications. This makes the design of new CAPTCHA mechanisms an urgent priority. We observe that humans are highly sensitive to shifts and abrupt changes in videos, while current AI systems still struggle to comprehend and respond to such situations effectively. Based on this observation, we design and implement BounTCHA, a CAPTCHA mechanism that leverages human perception of boundaries in video transitions and disruptions. By utilizing generative AI's capability to extend original videos with prompts, we introduce unexpected twists and changes to create a pipeline for generating guided short videos for CAPTCHA purposes. We develop a prototype and conduct experiments to collect data on humans' time biases in boundary identification. This data serves as a basis for distinguishing between human users and bots. Additionally, we perform a detailed security analysis of BounTCHA, demonstrating its resilience against various types of attacks. We hope that BounTCHA will act as a robust defense, safeguarding millions of web applications in the AI-driven era.

Paper Structure

This paper contains 28 sections, 12 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Common text-based and image-based CAPTCHA examples, including arithmetic CAPTCHA, reCAPTCHA, puzzle CAPTCHA, What's Up CATPCHA, and others.
  • Figure 2: Showcases of 3D & gamified CAPTCHAs. (a) is a text-based 3D CAPTCHA nguyen2014security. (b) is named 3D CAPTCHA imsamai20103d. (c) and (d) are gameified CAPTCHAs used by OpenAI's ChatGPT with the 3D view images.
  • Figure 3: The production pipeline for generating BounTCHA videos.
  • Figure 4: A bar chart comparing the sizes of original videos ($\mu=14106.62,\sigma=4590.93$) and compressed videos ($\mu=257.68,\sigma=55.75$), alongside a line graph depicting video lengths ($\mu=10.00,\sigma=2.42$), with the videos indexed according to their total duration.
  • Figure 5: The time cost of the video preparation pipeline. The length of the blocks is not drawn to scale based on the time duration.
  • ...and 10 more figures