Efficient One-stage Video Object Detection by Exploiting Temporal Consistency
Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson
TL;DR
This work targets efficient one-stage video object detection by exploiting temporal consistency to reduce computation. It identifies two core bottlenecks: the quadratic complexity of attention modules on large query sets $N_q$ and the heavy cost of detection heads on high-resolution, low-level feature maps. To address these, it introduces a Location Prior Network (LPN) to filter background regions and a Size Prior Network (SPN) to skip unnecessary low-level feature computations across frames, achieving faster inference with minimal accuracy loss. The approach, demonstrated on FCOS, CenterNet, and YOLOX and evaluated on ImageNet VID, yields strong speed-accuracy trade-offs and broad compatibility, with code availability to facilitate adoption in practice.
Abstract
Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for VOD. Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. Specifically, our method consists of a location-prior network to filter out background regions and a size-prior network to skip unnecessary computations on low-level feature maps for specific frames. We test our method on various modern one-stage detectors and conduct extensive experiments on the ImageNet VID dataset. Excellent experimental results demonstrate the superior effectiveness, efficiency, and compatibility of our method. The code is available at https://github.com/guanxiongsun/vfe.pytorch.
