Table of Contents
Fetching ...

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, Xiaolin Wei

TL;DR

This paper introduces YOLOv6, a single-stage object detector tailored for industrial deployment that balances speed and accuracy across model scales. It combines a hardware-friendly network design (EfficientRep CSPStackRep backbone/neck, Efficient Decoupled Head), a TAL-based label assignment, and carefully chosen loss functions, augmented by training tricks like self-distillation and extended epochs. A key contribution is the quantization-focused deployment pipeline (RepOptimizer, sensitivity analysis, CW Distill) enabling PTQ/QAT with strong performance, including a quantized YOLOv6-S achieving 43.3 AP at 869 FPS. The results show YOLOv6 outperforms peers at similar scales and provides practical deployment advantages, supported by open-source code for reproducibility.

Abstract

For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component. Our code is made available at https://github.com/meituan/YOLOv6.

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

TL;DR

This paper introduces YOLOv6, a single-stage object detector tailored for industrial deployment that balances speed and accuracy across model scales. It combines a hardware-friendly network design (EfficientRep CSPStackRep backbone/neck, Efficient Decoupled Head), a TAL-based label assignment, and carefully chosen loss functions, augmented by training tricks like self-distillation and extended epochs. A key contribution is the quantization-focused deployment pipeline (RepOptimizer, sensitivity analysis, CW Distill) enabling PTQ/QAT with strong performance, including a quantized YOLOv6-S achieving 43.3 AP at 869 FPS. The results show YOLOv6 outperforms peers at similar scales and provides practical deployment advantages, supported by open-source code for reproducibility.

Abstract

For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component. Our code is made available at https://github.com/meituan/YOLOv6.
Paper Structure (55 sections, 3 equations, 8 figures, 21 tables)

This paper contains 55 sections, 3 equations, 8 figures, 21 tables.

Figures (8)

  • Figure 1: Comparison of state-of-the-art efficient object detectors. Both latency and throughput (at a batch size of 32) are given for a handy reference. All models are test with TensorRT 7 except that the quantized model is with TensorRT 8.
  • Figure 2: The YOLOv6 framework (N and S are shown). Note for M/L, RepBlocks is replaced with CSPStackRep.
  • Figure 3: (a) RepBlock is composed of a stack of RepVGG blocks with ReLU activations at training. (b) During inference time, RepVGG block is converted to RepConv. (c) CSPStackRep Block comprises three 1$\times$1 convolutional layers and a stack of sub-blocks of double RepConvs following the ReLU activations with a residual connection.
  • Figure 4: Improved activation distribution of YOLOv6-S trained with RepOptimizer.
  • Figure 5: Schematic of YOLOv6 channel-wise distillation in QAT.
  • ...and 3 more figures