MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

Siliang Ma; Yong Xu

MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

Siliang Ma, Yong Xu

TL;DR

The paper addresses the limitations of existing bounding box regression losses that fail to differentiate predictions sharing the same aspect ratio but differing in width and height. It introduces MPDIoU, a minimum points distance–based IoU-like metric for axis-aligned boxes, and defines the regression loss $L_{MPDIoU}=1-MPDIoU$, aiming to improve convergence and localization accuracy. Through extensive experiments, MPDIoU-based losses applied to state-of-the-art models (e.g., YOLOv7 and YOLACT) on datasets such as PASCAL VOC, MS COCO, IIIT5k, and MTHv2 show faster convergence and superior accuracy across object detection, instance segmentation, and scene text spotting. The work presents MPDIoU as a practical, unified loss for 2D bounding box regression with potential extension to non-axis-aligned 3D cases in future research.

Abstract

Bounding box regression (BBR) has been widely used in object detection and instance segmentation, which is an important step in object localization. However, most of the existing loss functions for bounding box regression cannot be optimized when the predicted box has the same aspect ratio as the groundtruth box, but the width and height values are exactly different. In order to tackle the issues mentioned above, we fully explore the geometric features of horizontal rectangle and propose a novel bounding box similarity comparison metric MPDIoU based on minimum point distance, which contains all of the relevant factors considered in the existing loss functions, namely overlapping or non-overlapping area, central points distance, and deviation of width and height, while simplifying the calculation process. On this basis, we propose a bounding box regression loss function based on MPDIoU, called LMPDIoU . Experimental results show that the MPDIoU loss function is applied to state-of-the-art instance segmentation (e.g., YOLACT) and object detection (e.g., YOLOv7) model trained on PASCAL VOC, MS COCO, and IIIT5k outperforms existing loss functions.

MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

TL;DR

, aiming to improve convergence and localization accuracy. Through extensive experiments, MPDIoU-based losses applied to state-of-the-art models (e.g., YOLOv7 and YOLACT) on datasets such as PASCAL VOC, MS COCO, IIIT5k, and MTHv2 show faster convergence and superior accuracy across object detection, instance segmentation, and scene text spotting. The work presents MPDIoU as a practical, unified loss for 2D bounding box regression with potential extension to non-axis-aligned 3D cases in future research.

Abstract

Paper Structure (15 sections, 1 theorem, 12 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 15 sections, 1 theorem, 12 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Object Detection and Instance Segmentation
Scene Text Spotting
Loss Function for Bounding Box Regression
Intersection over Union with Minimum Points Distance
MPDIoU as Loss for Bounding Box Regression
Experimental Results
Experimental Settings
Datasets
Evaluation Protocol
Experimental Results of Object Detection
Experimental Results of Character-level Scene Text Spotting
Experimental Results of Instance Segmentation
Conclusion

Key Result

Theorem 3.1

We define one groundtruth bounding box as $\mathcal{B}_{gt}$ and two predicted bounding boxes as $\mathcal{B}_{prd1}$ and $\mathcal{B}_{prd2}$. The width and height of the input image are $w$ and $h$, respectively. Assume the top-left and bottom-right coordinates of $\mathcal{B}_{gt}$, $\mathcal{B}_

Figures (10)

Figure 1: The calculation factors of the existing metrics for bounding box regression including $GIoU$, $DIoU$, $CIoU$ and $EIoU$.
Figure 1: Comparison between the performance of YOLO v7 2022YOLOv7 trained using its own loss ($\mathcal{L}_{CIoU}$) as well as $\mathcal{L}_{GIoU}$, $\mathcal{L}_{DIoU}$, $\mathcal{L}_{EIoU}$ and $\mathcal{L}_{MPDIoU}$ losses. The results are reported on the test set of PASCAL VOC 2007&2012.
Figure 2: Two cases with different bounding boxes regression results. The green boxes denote the groundtruth bounding boxes and the red boxes denote the predicted bounding boxes. The $\mathcal{L}_{GIoU}$, $\mathcal{L}_{DIoU}$, $\mathcal{L}_{CIoU}$, $\mathcal{L}_{EIoU}$ between these two cases are exactly same value, but their $\mathcal{L}_{MPDIoU}$
Figure 3: Factors of our proposed $\mathcal{L}_{MPDIoU}$.
Figure 4: Examples of predicted bounding boxes and groundtruth bounding box with the same aspect ratio but different width and height, where $k>1$ and $k\in R$, the green box denotes the groundtruth box, and the red boxes denote the predicted boxes.
...and 5 more figures

Theorems & Definitions (2)

Theorem 3.1
proof

MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

TL;DR

Abstract

MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)