Table of Contents
Fetching ...

You Only Look Once at Anytime (AnytimeYOLO): Analysis and Optimization of Early-Exits for Object-Detection

Daniel Kuhse, Harun Teper, Sebastian Buschjäger, Chien-Yao Wang, Jian-Jia Chen

TL;DR

The paper addresses the need for real-time, interruptible object detection by introducing AnytimeYOLO, a family of YOLO-based architectures with multiple early-exits. It formalizes Anytime Models with interruptability, bounded returns, monotonic measurable quality, and a time-weighted quality metric, then designes GELAN-t and GELAN-m variants (and their transposed forms) to enable fine-grained early predictions across three feature scales. It further develops optimization techniques to select the optimal exit order and a subset of exits using a graph-based, longest-path framework, and discusses soft vs hard anytime execution and deployment hurdles on GPUs. Through MS COCO experiments, the work demonstrates trade-offs between early responsiveness and final accuracy, showing that increased exit granularity improves early-time performance while impacting later accuracy, and that optimal exit scheduling significantly outperforms greedy heuristics. The contributions advance practical, interruptible real-time object detection, with implications for safety-critical systems and resource-constrained deployments, while outlining pathways to integrate with current inference frameworks.

Abstract

We introduce AnytimeYOLO, a family of variants of the YOLO architecture that enables anytime object detection. Our AnytimeYOLO networks allow for interruptible inference, i.e., they provide a prediction at any point in time, a property desirable for safety-critical real-time applications. We present structured explorations to modify the YOLO architecture, enabling early termination to obtain intermediate results. We focus on providing fine-grained control through high granularity of available termination points. First, we formalize Anytime Models as a special class of prediction models that offer anytime predictions. Then, we discuss a novel transposed variant of the YOLO architecture, that changes the architecture to enable better early predictions and greater freedom for the order of processing stages. Finally, we propose two optimization algorithms that, given an anytime model, can be used to determine the optimal exit execution order and the optimal subset of early-exits to select for deployment in low-resource environments. We evaluate the anytime performance and trade-offs of design choices, proposing a new anytime quality metric for this purpose. In particular, we also discuss key challenges for anytime inference that currently make its deployment costly.

You Only Look Once at Anytime (AnytimeYOLO): Analysis and Optimization of Early-Exits for Object-Detection

TL;DR

The paper addresses the need for real-time, interruptible object detection by introducing AnytimeYOLO, a family of YOLO-based architectures with multiple early-exits. It formalizes Anytime Models with interruptability, bounded returns, monotonic measurable quality, and a time-weighted quality metric, then designes GELAN-t and GELAN-m variants (and their transposed forms) to enable fine-grained early predictions across three feature scales. It further develops optimization techniques to select the optimal exit order and a subset of exits using a graph-based, longest-path framework, and discusses soft vs hard anytime execution and deployment hurdles on GPUs. Through MS COCO experiments, the work demonstrates trade-offs between early responsiveness and final accuracy, showing that increased exit granularity improves early-time performance while impacting later accuracy, and that optimal exit scheduling significantly outperforms greedy heuristics. The contributions advance practical, interruptible real-time object detection, with implications for safety-critical systems and resource-constrained deployments, while outlining pathways to integrate with current inference frameworks.

Abstract

We introduce AnytimeYOLO, a family of variants of the YOLO architecture that enables anytime object detection. Our AnytimeYOLO networks allow for interruptible inference, i.e., they provide a prediction at any point in time, a property desirable for safety-critical real-time applications. We present structured explorations to modify the YOLO architecture, enabling early termination to obtain intermediate results. We focus on providing fine-grained control through high granularity of available termination points. First, we formalize Anytime Models as a special class of prediction models that offer anytime predictions. Then, we discuss a novel transposed variant of the YOLO architecture, that changes the architecture to enable better early predictions and greater freedom for the order of processing stages. Finally, we propose two optimization algorithms that, given an anytime model, can be used to determine the optimal exit execution order and the optimal subset of early-exits to select for deployment in low-resource environments. We evaluate the anytime performance and trade-offs of design choices, proposing a new anytime quality metric for this purpose. In particular, we also discuss key challenges for anytime inference that currently make its deployment costly.

Paper Structure

This paper contains 18 sections, 7 equations, 10 figures, 16 tables.

Figures (10)

  • Figure 1: AnytimeYOLO on the MS COCO dataset lin2014microsoft. The model is interrupted at different points in time, providing a prediction. Longer runtime leads to better predictions.
  • Figure 2: Anytime model quality, given observation $x$. The red area denotes the trivial anytime model quality, and the green area and blue areas are the quality of non-trivial anytime models. $T$ denotes the time point where the prediction quality stabilizes. The regular model produces predictions only after time $T$.
  • Figure 3: Model Architectures: six medium scale layers, but only 4 large and 5 small layers (colored in blue, red, green respectively).
  • Figure 4: Quality of GELAN-m and GELAN-m$^T$ with six trained exit, no pre-training.
  • Figure 5: Quality of GELAN-m$^T$ with three and six trained exits, no pre-training.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2