Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense
Jiacheng Liu, Yaxin Luo, Jiacheng Cui, Xinyi Shang, Xiaohan Zhao, Zhiqiang Shen
TL;DR
Next-Gen CAPTCHAs address the vulnerability of existing CAPTCHA systems to GUI-enabled agents by exploiting the Cognitive Gap in interactive perception, memory, decision-making, and action. The authors design a procedurally generated, rule-verified suite of 27 CAPTCHA families and an extended POMDP framework to model GUI-agent interaction, plus a scalable data-curation pipeline and a real-web evaluation platform. In live-browser experiments, humans solve near-ceiling (~98.8% Pass@1) while high-reasoning GUI agents remain largely unsuccessful, yielding a substantial defender margin and an economic asymmetry against attacks (cost and latency). The work provides a practical, scalable defense for the agentic web era and motivates accessibility-aware deployment and further study of interactive perception-grounding vulnerabilities.
Abstract
The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era.
