You Only Look Once at Anytime (AnytimeYOLO): Analysis and Optimization of Early-Exits for Object-Detection
Daniel Kuhse, Harun Teper, Sebastian Buschjäger, Chien-Yao Wang, Jian-Jia Chen
TL;DR
The paper addresses the need for real-time, interruptible object detection by introducing AnytimeYOLO, a family of YOLO-based architectures with multiple early-exits. It formalizes Anytime Models with interruptability, bounded returns, monotonic measurable quality, and a time-weighted quality metric, then designes GELAN-t and GELAN-m variants (and their transposed forms) to enable fine-grained early predictions across three feature scales. It further develops optimization techniques to select the optimal exit order and a subset of exits using a graph-based, longest-path framework, and discusses soft vs hard anytime execution and deployment hurdles on GPUs. Through MS COCO experiments, the work demonstrates trade-offs between early responsiveness and final accuracy, showing that increased exit granularity improves early-time performance while impacting later accuracy, and that optimal exit scheduling significantly outperforms greedy heuristics. The contributions advance practical, interruptible real-time object detection, with implications for safety-critical systems and resource-constrained deployments, while outlining pathways to integrate with current inference frameworks.
Abstract
We introduce AnytimeYOLO, a family of variants of the YOLO architecture that enables anytime object detection. Our AnytimeYOLO networks allow for interruptible inference, i.e., they provide a prediction at any point in time, a property desirable for safety-critical real-time applications. We present structured explorations to modify the YOLO architecture, enabling early termination to obtain intermediate results. We focus on providing fine-grained control through high granularity of available termination points. First, we formalize Anytime Models as a special class of prediction models that offer anytime predictions. Then, we discuss a novel transposed variant of the YOLO architecture, that changes the architecture to enable better early predictions and greater freedom for the order of processing stages. Finally, we propose two optimization algorithms that, given an anytime model, can be used to determine the optimal exit execution order and the optimal subset of early-exits to select for deployment in low-resource environments. We evaluate the anytime performance and trade-offs of design choices, proposing a new anytime quality metric for this purpose. In particular, we also discuss key challenges for anytime inference that currently make its deployment costly.
