Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
Gelei Deng, Haoran Ou, Yi Liu, Jie Zhang, Tianwei Zhang, Yang Liu
TL;DR
<3-5 sentence high-level summary> This paper investigates the vulnerability of reasoning CAPTCHAs to automated solvers and introduces Oedipus, an end-to-end framework that uses a CAPTCHA-specific Domain Specific Language (DSL) to decompose complex CAPTCHA challenges into AI-easy sub-tasks. By translating DSL scripts into natural-language instructions, a multimodal LLM performs stepwise reasoning (CoT) to solve challenges without training labels, achieving an average 63.5% success across multiple real-world CAPTCHA types and showing transferability to newly introduced CAPTCHAs. The study reveals that while LLMs understand CAPTCHA tasks and can decompose problems into subtasks, object-recognition limits and multi-step instruction handling significantly hinder autonomous solving, motivating the DSL-guided approach and informing future CAPTCHA design. The authors discuss practical costs, transferability, and defense strategies, emphasizing ethical considerations and transparency in releasing partial tools rather than fully automated solvers.
Abstract
CAPTCHAs have become a ubiquitous tool in safeguarding applications from automated bots. Over time, the arms race between CAPTCHA development and evasion techniques has led to increasingly sophisticated and diverse designs. The latest iteration, reasoning CAPTCHAs, exploits tasks that are intuitively simple for humans but challenging for conventional AI technologies, thereby enhancing security measures. Driven by the evolving AI capabilities, particularly the advancements in Large Language Models (LLMs), we investigate the potential of multimodal LLMs to solve modern reasoning CAPTCHAs. Our empirical analysis reveals that, despite their advanced reasoning capabilities, LLMs struggle to solve these CAPTCHAs effectively. In response, we introduce Oedipus, an innovative end-to-end framework for automated reasoning CAPTCHA solving. Central to this framework is a novel strategy that dissects the complex and human-easy-AI-hard tasks into a sequence of simpler and AI-easy steps. This is achieved through the development of a Domain Specific Language (DSL) for CAPTCHAs that guides LLMs in generating actionable sub-steps for each CAPTCHA challenge. The DSL is customized to ensure that each unit operation is a highly solvable subtask revealed in our previous empirical study. These sub-steps are then tackled sequentially using the Chain-of-Thought (CoT) methodology. Our evaluation shows that Oedipus effectively resolves the studied CAPTCHAs, achieving an average success rate of 63.5\%. Remarkably, it also shows adaptability to the most recent CAPTCHA designs introduced in late 2023, which are not included in our initial study. This prompts a discussion on future strategies for designing reasoning CAPTCHAs that can effectively counter advanced AI solutions.
