Table of Contents
Fetching ...

OriCon3D: Effective 3D Object Detection using Orientation and Confidence

Dhyey Manish Rajani, Surya Pratap Singh, Rahul Kashyap Swayampakula

TL;DR

By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, this approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies.

Abstract

In this paper, we propose an advanced methodology for the detection of 3D objects and precise estimation of their spatial positions from a single image. Unlike conventional frameworks that rely solely on center-point and dimension predictions, our research leverages a deep convolutional neural network-based 3D object weighted orientation regression paradigm. These estimates are then seamlessly integrated with geometric constraints obtained from a 2D bounding box, resulting in derivation of a comprehensive 3D bounding box. Our novel network design encompasses two key outputs. The first output involves the estimation of 3D object orientation through the utilization of a discrete-continuous loss function. Simultaneously, the second output predicts objectivity-based confidence scores with minimal variance. Additionally, we also introduce enhancements to our methodology through the incorporation of lightweight residual feature extractors. By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies. Our method is rigorously evaluated on the KITTI 3D object detection benchmark, demonstrating superior performance.

OriCon3D: Effective 3D Object Detection using Orientation and Confidence

TL;DR

By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, this approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies.

Abstract

In this paper, we propose an advanced methodology for the detection of 3D objects and precise estimation of their spatial positions from a single image. Unlike conventional frameworks that rely solely on center-point and dimension predictions, our research leverages a deep convolutional neural network-based 3D object weighted orientation regression paradigm. These estimates are then seamlessly integrated with geometric constraints obtained from a 2D bounding box, resulting in derivation of a comprehensive 3D bounding box. Our novel network design encompasses two key outputs. The first output involves the estimation of 3D object orientation through the utilization of a discrete-continuous loss function. Simultaneously, the second output predicts objectivity-based confidence scores with minimal variance. Additionally, we also introduce enhancements to our methodology through the incorporation of lightweight residual feature extractors. By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies. Our method is rigorously evaluated on the KITTI 3D object detection benchmark, demonstrating superior performance.
Paper Structure (10 sections, 4 equations, 4 figures, 5 tables)

This paper contains 10 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: 3D bounding box projection into 2D detection window for enhanced orientation and confidence estimation
  • Figure 2: OriCon3D: Our customized multi-bin multitask learning-based orientation and confidence estimator
  • Figure 3: The local $(\theta_{l})$ egocentric, global allocentric orientation $(\theta)$ of the vehicle, and ray angle $(\theta_{ray})$ w.r.t camera centre is shown. Vehicle's heading is shown by red arrow and green arrow is the centre ray i.e. between origin and vehicle body center. Therefore, vehicle orientation ($\theta$) = $\theta_{ray}$ + $\theta_{l}$. The network is trained to regressively estimate $\theta_{l}$.
  • Figure 4: Qualitative results: left column—cyclists and cars; middle column—pedestrians only; right column—cars only