Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification
Mikołaj Wysocki, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou, George Pantelis
TL;DR
This work tackles the problem of detecting and classifying CAPTCHA patterns embedded in web pages by benchmarking three YOLO families (YOLOv5, YOLOv8, YOLOv10) across their nano, small, and medium variants on a diverse dataset built from web, dark web, and synthesized pages. It introduces an image-slicing technique to handle oversized inputs and evaluates models using metrics such as $Precision$, $Recall$, $F1$, and $mAP@50$, along with inference speed, to assess real-world utility. Key contributions include a large, heterogeneous dataset (115,651 images) with four CAPTCHA types, a practical image-slicing method, and the demonstration that small, fast models excel in speed while larger models improve detection quality; retraining with small amounts of new-pattern data can adapt detectors to unseen CAPTCHA types. The findings guide deployment choices for CAPTCHA detectors in web crawlers and underscore the importance of continuous, diverse data collection to maintain robust performance amid evolving CAPTCHA schemes.
Abstract
This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.
