MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression
Siliang Ma, Yong Xu
TL;DR
The paper addresses the limitations of existing bounding box regression losses that fail to differentiate predictions sharing the same aspect ratio but differing in width and height. It introduces MPDIoU, a minimum points distance–based IoU-like metric for axis-aligned boxes, and defines the regression loss $L_{MPDIoU}=1-MPDIoU$, aiming to improve convergence and localization accuracy. Through extensive experiments, MPDIoU-based losses applied to state-of-the-art models (e.g., YOLOv7 and YOLACT) on datasets such as PASCAL VOC, MS COCO, IIIT5k, and MTHv2 show faster convergence and superior accuracy across object detection, instance segmentation, and scene text spotting. The work presents MPDIoU as a practical, unified loss for 2D bounding box regression with potential extension to non-axis-aligned 3D cases in future research.
Abstract
Bounding box regression (BBR) has been widely used in object detection and instance segmentation, which is an important step in object localization. However, most of the existing loss functions for bounding box regression cannot be optimized when the predicted box has the same aspect ratio as the groundtruth box, but the width and height values are exactly different. In order to tackle the issues mentioned above, we fully explore the geometric features of horizontal rectangle and propose a novel bounding box similarity comparison metric MPDIoU based on minimum point distance, which contains all of the relevant factors considered in the existing loss functions, namely overlapping or non-overlapping area, central points distance, and deviation of width and height, while simplifying the calculation process. On this basis, we propose a bounding box regression loss function based on MPDIoU, called LMPDIoU . Experimental results show that the MPDIoU loss function is applied to state-of-the-art instance segmentation (e.g., YOLACT) and object detection (e.g., YOLOv7) model trained on PASCAL VOC, MS COCO, and IIIT5k outperforms existing loss functions.
