Table of Contents
Fetching ...

DyRA: Portable Dynamic Resolution Adjustment Network for Existing Detectors

Daeun Seo, Hoeseok Yang, Hyungshin Kim

TL;DR

DyRA is introduced, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors providing an image-specific scale factor for existing detectors utilizing specially designed loss functions, namely ParetoScaleLoss and BalanceLoss.

Abstract

Achieving constant accuracy in object detection is challenging due to the inherent variability of object sizes. One effective approach to this problem involves optimizing input resolution, referred to as a multi-resolution strategy. Previous approaches to resolution optimization have often been based on pre-defined resolutions with manual selection. However, there is a lack of study on run-time resolution optimization for existing architectures. This paper introduces DyRA, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors. This network is co-trained with detectors utilizing specially designed loss functions, namely ParetoScaleLoss and BalanceLoss. ParetoScaleLoss determines an adaptive scale factor for robustness, while BalanceLoss optimizes overall scale factors according to the localization performance of the detector. The loss function is devised to minimize the accuracy drop across contrasting objectives of different-sized objects for scaling. Our proposed network can improve accuracy across various models, including RetinaNet, Faster-RCNN, FCOS, DINO, and H-Deformable-DETR. The code is available at https://github.com/DaEunFullGrace/DyRA.git.

DyRA: Portable Dynamic Resolution Adjustment Network for Existing Detectors

TL;DR

DyRA is introduced, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors providing an image-specific scale factor for existing detectors utilizing specially designed loss functions, namely ParetoScaleLoss and BalanceLoss.

Abstract

Achieving constant accuracy in object detection is challenging due to the inherent variability of object sizes. One effective approach to this problem involves optimizing input resolution, referred to as a multi-resolution strategy. Previous approaches to resolution optimization have often been based on pre-defined resolutions with manual selection. However, there is a lack of study on run-time resolution optimization for existing architectures. This paper introduces DyRA, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors. This network is co-trained with detectors utilizing specially designed loss functions, namely ParetoScaleLoss and BalanceLoss. ParetoScaleLoss determines an adaptive scale factor for robustness, while BalanceLoss optimizes overall scale factors according to the localization performance of the detector. The loss function is devised to minimize the accuracy drop across contrasting objectives of different-sized objects for scaling. Our proposed network can improve accuracy across various models, including RetinaNet, Faster-RCNN, FCOS, DINO, and H-Deformable-DETR. The code is available at https://github.com/DaEunFullGrace/DyRA.git.
Paper Structure (15 sections, 15 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 15 sections, 15 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Motivation for judicious image scaling. Scaling the resolution to align with the network's capacity can enhance the detection accuracy of extreme scales.
  • Figure 2: (a) The overall architecture consists of two networks: DyRA and the detector. First, DyRA aims to estimate the scale factor from the given image; then, the detector classifies and localizes objects from the resized image as a general inference process. The proposed network is optimized by two additional loss functions, which are ParetoScaleLoss and BalacneLoss. (b) ParetoScaleLoss optimizes an image-specific scale factor based on ScaleLoss, which decides a box-specific scale factor. These loss functions adjust the factor based on the relative location within the boundary sizes for up-/down-scaling. BalanceLoss modifies the boundaries based on the detector's localization performance.
  • Figure 3: (a) The $x$-axis denotes the size ratio, and the $y$-axis is the predicted scale factor from DyRA. The network is optimized by two additional loss functions, which are ParetoScaleLoss and BalacneLoss. (b) By the average of ${\mathcal{B}}$, boxes are divided into two groups: blue (to be upscaled) and green (to be downscaled). If the loss value of the blue region is larger than the green, the $\mu(\mathcal{B})$ will be moved to the right to reduce the loss value of the blue region.
  • Figure 4: Distribution of scale factors and examples of images. The $x$-axis is a square root of averaged box sizes within each image in the COCO.
  • Figure 5: Accuracy gain of each class of COCO trained with various detectors with DyRA.
  • ...and 6 more figures