Table of Contents
Fetching ...

Unified-IoU: For High-Quality Object Detection

Xiangjie Luo, Zhihao Cai, Bo Shao, Yingxun Wang

TL;DR

This paper tackles the limitation of traditional IoU-based bounding box regression losses that inadequately prioritize high-quality predictions. It introduces Unified-IoU (UIoU), a dynamic loss framework built from three components: Focal Box to scale bounding boxes and modulate loss weights, a ratio-based annealing strategy to shift attention from low- to high-quality boxes during training, and a Focal Loss-inspired weighting to emphasize difficult predictions. UIoU integrates with existing IoU losses, enabling straightforward comparisons while improving high-IoU performance on VOC2007 and COCO2017; however, its effectiveness on very dense datasets like CityPersons requires the Focal-inv variant to avoid overemphasizing low-quality boxes. Overall, the approach demonstrates that carefully modulated weight allocation by prediction quality can boost high-precision detection while balancing convergence speed, with practical implications for dense scene detection and future investigations on more challenging datasets.

Abstract

Object detection is an important part in the field of computer vision, and the effect of object detection is directly determined by the regression accuracy of the prediction box. As the key to model training, IoU (Intersection over Union) greatly shows the difference between the current prediction box and the Ground Truth box. Subsequent researchers have continuously added more considerations to IoU, such as center distance, aspect ratio, and so on. However, there is an upper limit to just refining the geometric differences; And there is a potential connection between the new consideration index and the IoU itself, and the direct addition or subtraction between the two may lead to the problem of "over-consideration". Based on this, we propose a new IoU loss function, called Unified-IoU (UIoU), which is more concerned with the weight assignment between different quality prediction boxes. Specifically, the loss function dynamically shifts the model's attention from low-quality prediction boxes to high-quality prediction boxes in a novel way to enhance the model's detection performance on high-precision or intensive datasets and achieve a balance in training speed. Our proposed method achieves better performance on multiple datasets, especially at a high IoU threshold, UIoU has a more significant improvement effect compared with other improved IoU losses. Our code is publicly available at: https://github.com/lxj-drifter/UIOU_files.

Unified-IoU: For High-Quality Object Detection

TL;DR

This paper tackles the limitation of traditional IoU-based bounding box regression losses that inadequately prioritize high-quality predictions. It introduces Unified-IoU (UIoU), a dynamic loss framework built from three components: Focal Box to scale bounding boxes and modulate loss weights, a ratio-based annealing strategy to shift attention from low- to high-quality boxes during training, and a Focal Loss-inspired weighting to emphasize difficult predictions. UIoU integrates with existing IoU losses, enabling straightforward comparisons while improving high-IoU performance on VOC2007 and COCO2017; however, its effectiveness on very dense datasets like CityPersons requires the Focal-inv variant to avoid overemphasizing low-quality boxes. Overall, the approach demonstrates that carefully modulated weight allocation by prediction quality can boost high-precision detection while balancing convergence speed, with practical implications for dense scene detection and future investigations on more challenging datasets.

Abstract

Object detection is an important part in the field of computer vision, and the effect of object detection is directly determined by the regression accuracy of the prediction box. As the key to model training, IoU (Intersection over Union) greatly shows the difference between the current prediction box and the Ground Truth box. Subsequent researchers have continuously added more considerations to IoU, such as center distance, aspect ratio, and so on. However, there is an upper limit to just refining the geometric differences; And there is a potential connection between the new consideration index and the IoU itself, and the direct addition or subtraction between the two may lead to the problem of "over-consideration". Based on this, we propose a new IoU loss function, called Unified-IoU (UIoU), which is more concerned with the weight assignment between different quality prediction boxes. Specifically, the loss function dynamically shifts the model's attention from low-quality prediction boxes to high-quality prediction boxes in a novel way to enhance the model's detection performance on high-precision or intensive datasets and achieve a balance in training speed. Our proposed method achieves better performance on multiple datasets, especially at a high IoU threshold, UIoU has a more significant improvement effect compared with other improved IoU losses. Our code is publicly available at: https://github.com/lxj-drifter/UIOU_files.
Paper Structure (19 sections, 9 equations, 5 figures, 4 tables)

This paper contains 19 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The existing weight allocation strategy: using center distance or IoU value as the weight allocation factor is obviously unacceptable (The three red boxes in the left image represent the prediction boxes, they have the same center distance as the blue GT box, but it is obviously unreasonable to give them the same weight; The three red boxes in the right image represent the predicted boxes that have the same IoU value as the blue GT box, and there is a large difference between them, it is also unreasonable to give the same weight)
  • Figure 2: Scaling of prediction boxes and GT boxes (the blue box represents GT and the red box represents prediction)
  • Figure 3: Change of IoU value with different scaling (prediction box from right to left close to GT box)
  • Figure 4: Variation of mAP50 with training epochs under different scaling ratios
  • Figure 5: Detection results of models trained with different IoU loss functions on the CityPersons dataset