Table of Contents
Fetching ...

Development of Occupancy Prediction Algorithm for Underground Parking Lots

Shijie Wang

TL;DR

A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments in dimly lit, challenging autonomous driving environments.

Abstract

The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated setting. Subsequently, the study integrates a Transformer-based Occupancy Network model to complete the occupancy grid prediction task within this scenario. A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments. Finally, experiments validate the accuracy of the proposed solution's perception performance in basement scenarios. The proposed solution is tested on our self-constructed underground garage dataset, SUSTech-COE-ParkingLot, yielding satisfactory results.

Development of Occupancy Prediction Algorithm for Underground Parking Lots

TL;DR

A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments in dimly lit, challenging autonomous driving environments.

Abstract

The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated setting. Subsequently, the study integrates a Transformer-based Occupancy Network model to complete the occupancy grid prediction task within this scenario. A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments. Finally, experiments validate the accuracy of the proposed solution's perception performance in basement scenarios. The proposed solution is tested on our self-constructed underground garage dataset, SUSTech-COE-ParkingLot, yielding satisfactory results.
Paper Structure (26 sections, 4 equations, 15 figures, 4 tables)

This paper contains 26 sections, 4 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: The coordinates to be transformed are augmented with an additional dimension, and the rotation matrix is zero-padded to form a four-dimensional homogeneous matrix. By left-multiplying the corresponding matrices from right to left, one can project the coordinates on the right to the corresponding coordinate system on the left. In this context, the rigid transformation matrix $\mathbf{R}$ represents the rotation matrix projecting a point from the world coordinate system to the camera coordinate system. This matrix is also the inverse of the pose matrix of the camera relative to the world coordinate system. The translation vector $\mathbf{T}$ indicates the corresponding translation vector, representing the distance from the origin of the world coordinate system to the origin of the camera coordinate system.
  • Figure 2: Incomplete Data Visualization, Which yellow voxels refer to wall, purple voxels refer to road, blue voxels refer to other stable features, green and red voxels refer to traffic line on the road.
  • Figure 3: Flowchart of the completion algorithm
  • Figure 4: where $P_C$ represents a point in the camera coordinate system, with the subscript indicating the corresponding frame number. The transformation matrices $Tr$ and $Tr^{-1}$ represent the matrices for projecting a point from the LiDAR coordinate system to the left-front camera coordinate system and their inverses, respectively. $pose_i$ and $pose_t^{-1}$ represent the matrices for projecting a point from the left-front camera coordinate system at frame $i$ to the coordinate system at frame 0, and for projecting a point from the left-front camera coordinate system at frame 0 to the coordinate system at frame $t$, respectively.
  • Figure 5: Completed Data Visualization, which colored voxels refer to the same semantics in \ref{['fig:uncompleted']}.
  • ...and 10 more figures