Table of Contents
Fetching ...

Object Detection in 20 Years: A Survey

Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, Jieping Ye

TL;DR

This survey chronicles two decades of object detection, tracing the shift from handcrafted traditional detectors to CNN-based two-stage and one-stage systems, and detailing datasets, metrics, and core techniques. It synthesizes milestone methods, speed-up strategies, and recent advances, emphasizing how multi-scale perception, context, loss design, and NMS have shaped performance and efficiency. The authors highlight practical implications for real-time deployment, edge devices, and cross-domain robustness, while outlining open challenges and promising directions such as end-to-end detection, 3D and video detection, and open-world reasoning. Overall, the paper provides a comprehensive roadmap of the field’s evolution and a guide for future research and application-oriented development.

Abstract

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Over the past two decades, we have seen a rapid technological evolution of object detection and its profound impact on the entire computer vision field. If we consider today's object detection technique as a revolution driven by deep learning, then back in the 1990s, we would see the ingenious thinking and long-term perspective design of early computer vision. This paper extensively reviews this fast-moving research field in the light of technical evolution, spanning over a quarter-century's time (from the 1990s to 2022). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed-up techniques, and the recent state-of-the-art detection methods.

Object Detection in 20 Years: A Survey

TL;DR

This survey chronicles two decades of object detection, tracing the shift from handcrafted traditional detectors to CNN-based two-stage and one-stage systems, and detailing datasets, metrics, and core techniques. It synthesizes milestone methods, speed-up strategies, and recent advances, emphasizing how multi-scale perception, context, loss design, and NMS have shaped performance and efficiency. The authors highlight practical implications for real-time deployment, edge devices, and cross-domain robustness, while outlining open challenges and promising directions such as end-to-end detection, 3D and video detection, and open-world reasoning. Overall, the paper provides a comprehensive roadmap of the field’s evolution and a guide for future research and application-oriented development.

Abstract

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Over the past two decades, we have seen a rapid technological evolution of object detection and its profound impact on the entire computer vision field. If we consider today's object detection technique as a revolution driven by deep learning, then back in the 1990s, we would see the ingenious thinking and long-term perspective design of early computer vision. This paper extensively reviews this fast-moving research field in the light of technical evolution, spanning over a quarter-century's time (from the 1990s to 2022). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed-up techniques, and the recent state-of-the-art detection methods.

Paper Structure

This paper contains 43 sections, 5 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: The increasing number of publications in object detection from 1998 to 2021. (Data from Google scholar advanced search: allintitle: "object detection" OR "detecting objects".)
  • Figure 2: A road map of object detection. Milestone detectors in this figure: VJ Det. CVPR01-VJIJCV04-VJ, HOG Det. CVPR05-HOG, DPM CVPR08-DPMCVPR10-DPMTPAMI10-DPM, RCNN CVPR14-RCNN, SPPNet ECCV14-SPPNet, Fast RCNN ICCV15-FastRCNN, Faster RCNN NIPS15-FasterRCNN, YOLO CVPR16-YOLOredmon2018yolov3bochkovskiy2020yolov4, SSD ECCV16-SSD, FPN CVPR17-FPN, Retina-Net ICCV17-Focal, CornerNet law2018cornernet, CenterNet zhao2019object, DETR carion2020end.
  • Figure 3: Accuracy improvement of object detection on VOC07, VOC12 and MS-COCO datasets. Detectors in this figure: DPM-v1 CVPR08-DPM, DPM-v5 ECCV14-DPMv5, RCNN CVPR14-RCNN, SPPNet ECCV14-SPPNet, Fast RCNN ICCV15-FastRCNN, Faster RCNN NIPS15-FasterRCNN, SSD ECCV16-SSD, FPN CVPR17-FPN, Retina-Net ICCV17-Focal, RefineDet CVPR18-SingleShotRefine, TridentNet li2019scale CenterNet zhou2019objects, FCOS tian2019fcos, HTC chen2019hybrid, YOLOv4 bochkovskiy2020yolov4, Deformable DETR zhu2020deformable, Swin Transformer liu2021swin.
  • Figure 4: Some example images and annotations in (a) PASCAL-VOC07, (b) ILSVRC, (c) MS-COCO, and (d) Open Images.
  • Figure 5: Evolution of multi-scale detection techniques in object detection. Detectors in this figure: VJ Det. CVPR01-VJ, HOG Det. CVPR05-HOG, DPM CVPR08-DPM, Exemplar SVM ICCV11-Exemplar, Overfeat ICLR14-Overfeat, RCNN CVPR14-RCNN, SPPNet ECCV14-SPPNet, Fast RCNN ICCV15-FastRCNN, Faster RCNN NIPS15-FasterRCNN, DNN Det. NIPS13-DNNDetec, YOLO CVPR16-YOLO, SSD ECCV16-SSD, Unified Det. ECCV16-Unified, FPN CVPR17-FPN, RetinaNet ICCV17-Focal, RefineDet CVPR18-SingleShotRefine, Cascade R-CNN cai2018cascade, Swin Transformer liu2021swin, FCOS tian2019fcos, YOLOv4 bochkovskiy2020yolov4, CornerNet law2018cornernet, CenterNet zhou2019objects, Reppoints yang2019reppoints, DETR carion2020end.
  • ...and 8 more figures