ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

JunKyu Lee; Blesson Varghese; Hans Vandierendonck

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

JunKyu Lee, Blesson Varghese, Hans Vandierendonck

TL;DR

ROMA addresses the challenge of maintaining high real-time detection accuracy when video content and compute resources vary. It introduces a run-time accuracy variation model that estimates Relative Average Precision (RAP) between detectors without ground-truth labels, leveraging runtime cues such as object size histograms and detection latency. The method combines offline AP estimation, an AP degradation model across dropped frames via a degradation factor $\beta$, and a RAP-based detector selection mechanism to switch among detectors in real time. Experiments on MOT17Det and MOT20Det with four YOLOv4 variants on an NVIDIA Jetson Nano demonstrate substantial real-time accuracy gains over single detectors and prior runtime techniques, highlighting ROMA's practicality for dynamic, resource-constrained video analytics.

Abstract

This paper analyzes the effects of dynamically varying video contents and detection latency on the real-time detection accuracy of a detector and proposes a new run-time accuracy variation model, ROMA, based on the findings from the analysis. ROMA is designed to select an optimal detector out of a set of detectors in real time without label information to maximize real-time object detection accuracy. ROMA utilizing four YOLOv4 detectors on an NVIDIA Jetson Nano shows real-time accuracy improvements by 4 to 37% for a scenario of dynamically varying video contents and detection latency consisting of MOT17Det and MOT20Det datasets, compared to individual YOLOv4 detectors and two state-of-the-art runtime techniques.

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

TL;DR

, and a RAP-based detector selection mechanism to switch among detectors in real time. Experiments on MOT17Det and MOT20Det with four YOLOv4 variants on an NVIDIA Jetson Nano demonstrate substantial real-time accuracy gains over single detectors and prior runtime techniques, highlighting ROMA's practicality for dynamic, resource-constrained video analytics.

Abstract

Paper Structure (15 sections, 21 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 21 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Related Work
ROMA: Run-Time Accuracy Variation
Real-Time Accuracy Characteristics
Notations for Frames and Frame Block Sizes
Offline AP Estimation of Each Detector
AP Degradation at Each Dropped Frame
Estimating Relative Average Precision
Implementation of ROMA
Initialization
Running Process
Experimental Evaluation
Real-Time AP Measurements
Decisions by ROMA
Conclusion

Figures (7)

Figure 1: Accuracy Variation with Dynamically Varying Objects' Speeds and Available Compute Resources
Figure 2: Notations for Frames and Frame Block Sizes
Figure 3: Average APs (MOT17Det and MOT20Det)
Figure 4: MOT17-04 (Left) and MOT20-05 (Right) MOTDet
Figure 5: Average APs across All Cases (MOT17Det+MOT20Det)
...and 2 more figures

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

TL;DR

Abstract

ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

Authors

TL;DR

Abstract

Table of Contents

Figures (7)