Table of Contents
Fetching ...

YOLOv6 v3.0: A Full-Scale Reloading

Chuyi Li, Lulu Li, Yifei Geng, Hongliang Jiang, Meng Cheng, Bo Zhang, Zaidan Ke, Xiaoming Xu, Xiangxiang Chu

TL;DR

YOLOv6 v3.0 targets real-time, high-accuracy object detection by revamping the network architecture (BiC neck, SimCSPSPPF, deeper backbones/necks to N6/S6/M6/L6) and training strategies (AAT and self-distillation with DLD). The anchor-aided training blends anchor-based and anchor-free signals during training without sacrificing inference speed, while self-distillation provides additional gains for small models. Extensive COCO experiments show superior AP at competitive throughput across sizes, with 1280-pixel inputs enabling state-of-the-art real-time accuracy. Overall, architectural refinements and training-time strategies yield substantial accuracy improvements suitable for industrial deployment.

Abstract

The YOLO community has been in high spirits since our first two releases! By the advent of Chinese New Year 2023, which sees the Year of the Rabbit, we refurnish YOLOv6 with numerous novel enhancements on the network architecture and the training scheme. This release is identified as YOLOv6 v3.0. For a glimpse of performance, our YOLOv6-N hits 37.5% AP on the COCO dataset at a throughput of 1187 FPS tested with an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 45.0% AP at 484 FPS, outperforming other mainstream detectors at the same scale (YOLOv5-S, YOLOv8-S, YOLOX-S and PPYOLOE-S). Whereas, YOLOv6-M/L also achieve better accuracy performance (50.0%/52.8% respectively) than other detectors at a similar inference speed. Additionally, with an extended backbone and neck design, our YOLOv6-L6 achieves the state-of-the-art accuracy in real-time. Extensive experiments are carefully conducted to validate the effectiveness of each improving component. Our code is made available at https://github.com/meituan/YOLOv6.

YOLOv6 v3.0: A Full-Scale Reloading

TL;DR

YOLOv6 v3.0 targets real-time, high-accuracy object detection by revamping the network architecture (BiC neck, SimCSPSPPF, deeper backbones/necks to N6/S6/M6/L6) and training strategies (AAT and self-distillation with DLD). The anchor-aided training blends anchor-based and anchor-free signals during training without sacrificing inference speed, while self-distillation provides additional gains for small models. Extensive COCO experiments show superior AP at competitive throughput across sizes, with 1280-pixel inputs enabling state-of-the-art real-time accuracy. Overall, architectural refinements and training-time strategies yield substantial accuracy improvements suitable for industrial deployment.

Abstract

The YOLO community has been in high spirits since our first two releases! By the advent of Chinese New Year 2023, which sees the Year of the Rabbit, we refurnish YOLOv6 with numerous novel enhancements on the network architecture and the training scheme. This release is identified as YOLOv6 v3.0. For a glimpse of performance, our YOLOv6-N hits 37.5% AP on the COCO dataset at a throughput of 1187 FPS tested with an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 45.0% AP at 484 FPS, outperforming other mainstream detectors at the same scale (YOLOv5-S, YOLOv8-S, YOLOX-S and PPYOLOE-S). Whereas, YOLOv6-M/L also achieve better accuracy performance (50.0%/52.8% respectively) than other detectors at a similar inference speed. Additionally, with an extended backbone and neck design, our YOLOv6-L6 achieves the state-of-the-art accuracy in real-time. Extensive experiments are carefully conducted to validate the effectiveness of each improving component. Our code is made available at https://github.com/meituan/YOLOv6.
Paper Structure (17 sections, 3 equations, 3 figures, 11 tables)

This paper contains 17 sections, 3 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Comparison of state-of-the-art efficient object detectors. Both latency and throughput (at a batch size of 32) are given for a handy reference. All models are test with TensorRT 7.
  • Figure 2: (a) The neck of YOLOv6 (N and S are shown). Note for M/L, RepBlocks is replaced with CSPStackRep. (b) The structure of a BiC module. (c) A SimCSPSPPF block.
  • Figure 3: The detection head with anchor-based auxiliary branches during training. The auxiliary branches are removed at inference. 'af' and 'ab' are short for 'anchor-free' and 'anchor-based'.