Table of Contents
Fetching ...

Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification

Mikołaj Wysocki, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou, George Pantelis

TL;DR

This work tackles the problem of detecting and classifying CAPTCHA patterns embedded in web pages by benchmarking three YOLO families (YOLOv5, YOLOv8, YOLOv10) across their nano, small, and medium variants on a diverse dataset built from web, dark web, and synthesized pages. It introduces an image-slicing technique to handle oversized inputs and evaluates models using metrics such as $Precision$, $Recall$, $F1$, and $mAP@50$, along with inference speed, to assess real-world utility. Key contributions include a large, heterogeneous dataset (115,651 images) with four CAPTCHA types, a practical image-slicing method, and the demonstration that small, fast models excel in speed while larger models improve detection quality; retraining with small amounts of new-pattern data can adapt detectors to unseen CAPTCHA types. The findings guide deployment choices for CAPTCHA detectors in web crawlers and underscore the importance of continuous, diverse data collection to maintain robust performance amid evolving CAPTCHA schemes.

Abstract

This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.

Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification

TL;DR

This work tackles the problem of detecting and classifying CAPTCHA patterns embedded in web pages by benchmarking three YOLO families (YOLOv5, YOLOv8, YOLOv10) across their nano, small, and medium variants on a diverse dataset built from web, dark web, and synthesized pages. It introduces an image-slicing technique to handle oversized inputs and evaluates models using metrics such as , , , and , along with inference speed, to assess real-world utility. Key contributions include a large, heterogeneous dataset (115,651 images) with four CAPTCHA types, a practical image-slicing method, and the demonstration that small, fast models excel in speed while larger models improve detection quality; retraining with small amounts of new-pattern data can adapt detectors to unseen CAPTCHA types. The findings guide deployment choices for CAPTCHA detectors in web crawlers and underscore the importance of continuous, diverse data collection to maintain robust performance amid evolving CAPTCHA schemes.

Abstract

This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.

Paper Structure

This paper contains 13 sections, 1 equation, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The CAPTCHA image is combined with the webpage image to obtain a synthetized image of the webpage protected by CAPTCHA. Both original and synthetized webpage images are then used in training of the neural network.
  • Figure 2: Four CAPTCHA images classes (a) button CloudCaptcha, (b) text WikiCaptcha, (c) puzzle PuzzleAmazon, and (d) image Hcaptcha.
  • Figure 3: Preprocessing process flow diagram.
  • Figure 4: Distribution of classes in the dataset.
  • Figure 5: Rescaling the webpage image CeasefireWeb degrades pixel resolution and makes the text inside the red box unreadable. Slicing the original image prevents the information loss.
  • ...and 4 more figures