Table of Contents
Fetching ...

Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving

Yuxuan Liu

TL;DR

This dissertation introduces structural enhancements to both monocular and stereo 3D object detection algorithms by integrating ground-referenced geometric priors into monocular detection models, achieving unparalleled accuracy in benchmark evaluations for monocular 3D detection.

Abstract

This dissertation is a multifaceted contribution to the advancement of vision-based 3D perception technologies. In the first segment, the thesis introduces structural enhancements to both monocular and stereo 3D object detection algorithms. By integrating ground-referenced geometric priors into monocular detection models, this research achieves unparalleled accuracy in benchmark evaluations for monocular 3D detection. Concurrently, the work refines stereo 3D detection paradigms by incorporating insights and inferential structures gleaned from monocular networks, thereby augmenting the operational efficiency of stereo detection systems. The second segment is devoted to data-driven strategies and their real-world applications in 3D vision detection. A novel training regimen is introduced that amalgamates datasets annotated with either 2D or 3D labels. This approach not only augments the detection models through the utilization of a substantially expanded dataset but also facilitates economical model deployment in real-world scenarios where only 2D annotations are readily available. Lastly, the dissertation presents an innovative pipeline tailored for unsupervised depth estimation in autonomous driving contexts. Extensive empirical analyses affirm the robustness and efficacy of this newly proposed pipeline. Collectively, these contributions lay a robust foundation for the widespread adoption of vision-based 3D perception technologies in autonomous driving applications.

Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving

TL;DR

This dissertation introduces structural enhancements to both monocular and stereo 3D object detection algorithms by integrating ground-referenced geometric priors into monocular detection models, achieving unparalleled accuracy in benchmark evaluations for monocular 3D detection.

Abstract

This dissertation is a multifaceted contribution to the advancement of vision-based 3D perception technologies. In the first segment, the thesis introduces structural enhancements to both monocular and stereo 3D object detection algorithms. By integrating ground-referenced geometric priors into monocular detection models, this research achieves unparalleled accuracy in benchmark evaluations for monocular 3D detection. Concurrently, the work refines stereo 3D detection paradigms by incorporating insights and inferential structures gleaned from monocular networks, thereby augmenting the operational efficiency of stereo detection systems. The second segment is devoted to data-driven strategies and their real-world applications in 3D vision detection. A novel training regimen is introduced that amalgamates datasets annotated with either 2D or 3D labels. This approach not only augments the detection models through the utilization of a substantially expanded dataset but also facilitates economical model deployment in real-world scenarios where only 2D annotations are readily available. Lastly, the dissertation presents an innovative pipeline tailored for unsupervised depth estimation in autonomous driving contexts. Extensive empirical analyses affirm the robustness and efficacy of this newly proposed pipeline. Collectively, these contributions lay a robust foundation for the widespread adoption of vision-based 3D perception technologies in autonomous driving applications.
Paper Structure (105 sections, 31 equations, 27 figures, 15 tables, 3 algorithms)

This paper contains 105 sections, 31 equations, 27 figures, 15 tables, 3 algorithms.

Figures (27)

  • Figure 1: Examples of autonomous driving applications. (a): autonomous Japan taxi tested in Tokyo Autoware; (b): autonomous logistic platform fueling the Guangzhou Nansha Port GuangzhouPort; (c): autonomous vehicle for contactless delivery services in HKUST campus Hercules
  • Figure 2: An illustrative modular framework for autonomous driving systems Autoware, drawn from Autoware's design principles. The focus of this thesis primarily lies in the 3D-related tasks inside the vision-based perception module, a critical component within the larger perception challenge.
  • Figure 3: Illustrative example for a LiDAR and a camera.
  • Figure 4: An illustration of vision-based 3D object detection and depth prediction for 3D scene perception from a single image input. The figure is created with models proposed in Chapter 4 and Chapter 5 on completely unseen images.
  • Figure 5: This thesis is systematically divided into two principal segments. The first segment concentrates on refining the architecture and training methodologies of 3D object detection networks. The latter segment is devoted to the development of a comprehensive framework for full-scale unsupervised depth prediction. Collectively, these two segments contribute to a nuanced 3D understanding of both dynamic and static elements within the surrounding environment.
  • ...and 22 more figures