Object Detection in 20 Years: A Survey
Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, Jieping Ye
TL;DR
This survey chronicles two decades of object detection, tracing the shift from handcrafted traditional detectors to CNN-based two-stage and one-stage systems, and detailing datasets, metrics, and core techniques. It synthesizes milestone methods, speed-up strategies, and recent advances, emphasizing how multi-scale perception, context, loss design, and NMS have shaped performance and efficiency. The authors highlight practical implications for real-time deployment, edge devices, and cross-domain robustness, while outlining open challenges and promising directions such as end-to-end detection, 3D and video detection, and open-world reasoning. Overall, the paper provides a comprehensive roadmap of the field’s evolution and a guide for future research and application-oriented development.
Abstract
Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Over the past two decades, we have seen a rapid technological evolution of object detection and its profound impact on the entire computer vision field. If we consider today's object detection technique as a revolution driven by deep learning, then back in the 1990s, we would see the ingenious thinking and long-term perspective design of early computer vision. This paper extensively reviews this fast-moving research field in the light of technical evolution, spanning over a quarter-century's time (from the 1990s to 2022). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed-up techniques, and the recent state-of-the-art detection methods.
