EnSolver: Uncertainty-Aware Ensemble CAPTCHA Solvers with Theoretical Guarantees
Duc C. Hoang, Behzad Ousat, Amin Kharraz, Cuong V. Nguyen
TL;DR
EnSolver introduces uncertainty-aware CAPTCHA solvers that use deep ensembles to detect and skip out-of-distribution CAPTCHAs, mitigating failure-driven lockouts. LEnSolver extends this by imposing a maximum number of skips, ensuring practical progress in solving attempts. The authors derive novel theoretical guarantees via an out-of-distribution error bound (OEB) and provide lower bounds on right-decision and success rates for EnSolver and LEnSolver, respectively. Empirical results on in- and out-of-distribution CAPTCHA data show robust performance improvements over strong baselines and confirm the relevance of the theoretical bounds for real-world settings.
Abstract
The popularity of text-based CAPTCHA as a security mechanism to protect websites from automated bots has prompted researches in CAPTCHA solvers, with the aim of understanding its failure cases and subsequently making CAPTCHAs more secure. Recently proposed solvers, built on advances in deep learning, are able to crack even the very challenging CAPTCHAs with high accuracy. However, these solvers often perform poorly on out-of-distribution samples that contain visual features different from those in the training set. Furthermore, they lack the ability to detect and avoid such samples, making them susceptible to being locked out by defense systems after a certain number of failed attempts. In this paper, we propose EnSolver, a family of CAPTCHA solvers that use deep ensemble uncertainty to detect and skip out-of-distribution CAPTCHAs, making it harder to be detected. We prove novel theoretical bounds on the effectiveness of our solvers and demonstrate their use with state-of-the-art CAPTCHA solvers. Our experiments show that the proposed approaches perform well when cracking CAPTCHA datasets that contain both in-distribution and out-of-distribution samples.
