Table of Contents
Fetching ...

Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

Weiping Xiao, Yiqiang Wu, Chang Liu, Yu Qin, Xiaomao Li, Liming Xin

TL;DR

The paper addresses the underperformance of center-based one-stage 3D detectors by identifying center attribute regression and IoU prediction as key bottlenecks. It introduces DIRM, composed of Decoupled Attribute Regression (DAR) for long-range, attribute-decoupled center regression with adaptive sample assignment, and Interactive Quality Prediction (IQP) to stabilize IoU estimates by leveraging class-agnostic object classification. Extensive experiments on Waymo and ONCE show that DAR and IQP yield significant gains with minimal latency, achieving state-of-the-art results and strong generalization across backbones and representations. The work demonstrates that plug-and-play DAR and IQP can elevate one-stage detectors to or beyond competitive two-stage methods, with practical impact for real-time autonomous driving systems.

Abstract

Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression tasks significantly impacts the bounding box prediction since it produces unreliable quality (IoU) rectification. To tackle these problems, we propose Decoupled and Interactive Regression Modeling (DIRM) for one-stage detection. Specifically, Decoupled Attribute Regression (DAR) is implemented to facilitate long regression range modeling for the center attribute through an adaptive multi-sample assignment strategy that deeply decouples bounding box attributes. On the other hand, to enhance the reliability of IoU predictions for low-quality results, Interactive Quality Prediction (IQP) integrates the classification task, proficient in modeling negative samples, with quality prediction for joint optimization. Extensive experiments on Waymo and ONCE datasets demonstrate that DIRM significantly improves the performance of several state-of-the-art methods with minimal additional inference latency. Notably, DIRM achieves state-of-the-art detection performance on both the Waymo and ONCE datasets.

Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

TL;DR

The paper addresses the underperformance of center-based one-stage 3D detectors by identifying center attribute regression and IoU prediction as key bottlenecks. It introduces DIRM, composed of Decoupled Attribute Regression (DAR) for long-range, attribute-decoupled center regression with adaptive sample assignment, and Interactive Quality Prediction (IQP) to stabilize IoU estimates by leveraging class-agnostic object classification. Extensive experiments on Waymo and ONCE show that DAR and IQP yield significant gains with minimal latency, achieving state-of-the-art results and strong generalization across backbones and representations. The work demonstrates that plug-and-play DAR and IQP can elevate one-stage detectors to or beyond competitive two-stage methods, with practical impact for real-time autonomous driving systems.

Abstract

Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression tasks significantly impacts the bounding box prediction since it produces unreliable quality (IoU) rectification. To tackle these problems, we propose Decoupled and Interactive Regression Modeling (DIRM) for one-stage detection. Specifically, Decoupled Attribute Regression (DAR) is implemented to facilitate long regression range modeling for the center attribute through an adaptive multi-sample assignment strategy that deeply decouples bounding box attributes. On the other hand, to enhance the reliability of IoU predictions for low-quality results, Interactive Quality Prediction (IQP) integrates the classification task, proficient in modeling negative samples, with quality prediction for joint optimization. Extensive experiments on Waymo and ONCE datasets demonstrate that DIRM significantly improves the performance of several state-of-the-art methods with minimal additional inference latency. Notably, DIRM achieves state-of-the-art detection performance on both the Waymo and ONCE datasets.
Paper Structure (13 sections, 6 equations, 6 figures, 7 tables)

This paper contains 13 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Detection performance and inference time of DIRM, compared with state-of-the-art methods. All methods are trained on the 20% Waymo training set, and the inference latency is evaluated on a single NVIDIA A100 GPU. Results show that DIRM remarkably outperforms the baseline method while adding little inference latency. Besides, DIRM outperforms the previous state-of-the-art two-stage and transformer-based methods.
  • Figure 2: Schematic representation of incorrect center attribute regression and IoU prediction. $O_{max}$ represents the greatest offset error of the center attribute, and $I_{min}$ is the minimum value of the IoU prediction.
  • Figure 3: The DIRM framework. The network contains two key designs, namely Decoupled Attribute Regression (DAR) and Interactive Quality Prediction (IQP). In the training stage, DAR only performs adaptive multi-sample assignment and supervision on several bounding box attributes, while IQP performs joint optimization and supervision on the class-agnostic object classification and quality prediction tasks. $Conv$ represents a conventional 2D convolutional layer, where the size of the convolution kernel is 3 × 3, and the number of channels remains the same as that of the shared feature map.
  • Figure 4: Decoupled attribute regression. "center", "center z", "lwh", and "$\theta$" respectively denote the center, center height, length, width, height, and orientation angle of the bounding box. $IoU_{center}$ is a good representation of the center sample's quality.
  • Figure 5: Schematic diagram of interactive quality prediction. "Cls", "Attr.", and "Obj." denote the classification branch, bounding box attribute regression branch, and class-agnostic object classification branch, respectively.
  • ...and 1 more figures