Table of Contents
Fetching ...

Efficient Perception, Planning, and Control Algorithm for Vision-Based Automated Vehicles

Der-Hau Lee

TL;DR

This paper addresses resource-constrained, vision-based autonomous driving by integrating MTUNet, a multi-task UNet, with CILQR-based motion planning and a look-ahead vision predictive control (VPC) scheme. The MTUNet outputs lane segmentation, ego heading, road type, and traffic-object detections at real-time rates, while VPC provides curvature-informed steering corrections to reduce latency, all without HD maps. The approach yields a low-latency, map-free control loop where lateral planning runs in ~0.58 ms and perception achieves up to ~40 FPS on 228×228 inputs, outperforming SQP-based baselines on curvy roads in TORCS simulations. The findings demonstrate a practical, efficient framework for vision-based automated vehicles using minimal sensing hardware (monocular camera + inexpensive radars) suitable for real-world deployment on commodity hardware.

Abstract

Autonomous vehicles have limited computational resources and thus require efficient control systems. The cost and size of sensors have limited the development of self-driving cars. To overcome these restrictions, this study proposes an efficient framework for the operation of vision-based automatic vehicles; the framework requires only a monocular camera and a few inexpensive radars. The proposed algorithm comprises a multi-task UNet (MTUNet) network for extracting image features and constrained iterative linear quadratic regulator (CILQR) and vision predictive control (VPC) modules for rapid motion planning and control. MTUNet is designed to simultaneously solve lane line segmentation, the ego vehicle's heading angle regression, road type classification, and traffic object detection tasks at approximately 40 FPS for 228 x 228 pixel RGB input images. The CILQR controllers then use the MTUNet outputs and radar data as inputs to produce driving commands for lateral and longitudinal vehicle guidance within only 1 ms. In particular, the VPC algorithm is included to reduce steering command latency to below actuator latency, preventing performance degradation during tight turns. The VPC algorithm uses road curvature data from MTUNet to estimate the appropriate correction for the current steering angle at a look-ahead point to adjust the turning amount. The inclusion of the VPC algorithm in a VPC-CILQR controller leads to higher performance on curvy roads than the use of CILQR alone. Our experiments demonstrate that the proposed autonomous driving system, which does not require high-definition maps, can be applied in current autonomous vehicles.

Efficient Perception, Planning, and Control Algorithm for Vision-Based Automated Vehicles

TL;DR

This paper addresses resource-constrained, vision-based autonomous driving by integrating MTUNet, a multi-task UNet, with CILQR-based motion planning and a look-ahead vision predictive control (VPC) scheme. The MTUNet outputs lane segmentation, ego heading, road type, and traffic-object detections at real-time rates, while VPC provides curvature-informed steering corrections to reduce latency, all without HD maps. The approach yields a low-latency, map-free control loop where lateral planning runs in ~0.58 ms and perception achieves up to ~40 FPS on 228×228 inputs, outperforming SQP-based baselines on curvy roads in TORCS simulations. The findings demonstrate a practical, efficient framework for vision-based automated vehicles using minimal sensing hardware (monocular camera + inexpensive radars) suitable for real-world deployment on commodity hardware.

Abstract

Autonomous vehicles have limited computational resources and thus require efficient control systems. The cost and size of sensors have limited the development of self-driving cars. To overcome these restrictions, this study proposes an efficient framework for the operation of vision-based automatic vehicles; the framework requires only a monocular camera and a few inexpensive radars. The proposed algorithm comprises a multi-task UNet (MTUNet) network for extracting image features and constrained iterative linear quadratic regulator (CILQR) and vision predictive control (VPC) modules for rapid motion planning and control. MTUNet is designed to simultaneously solve lane line segmentation, the ego vehicle's heading angle regression, road type classification, and traffic object detection tasks at approximately 40 FPS for 228 x 228 pixel RGB input images. The CILQR controllers then use the MTUNet outputs and radar data as inputs to produce driving commands for lateral and longitudinal vehicle guidance within only 1 ms. In particular, the VPC algorithm is included to reduce steering command latency to below actuator latency, preventing performance degradation during tight turns. The VPC algorithm uses road curvature data from MTUNet to estimate the appropriate correction for the current steering angle at a look-ahead point to adjust the turning amount. The inclusion of the VPC algorithm in a VPC-CILQR controller leads to higher performance on curvy roads than the use of CILQR alone. Our experiments demonstrate that the proposed autonomous driving system, which does not require high-definition maps, can be applied in current autonomous vehicles.
Paper Structure (17 sections, 55 equations, 14 figures, 9 tables)

This paper contains 17 sections, 55 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Proposed vision-based automated driving framework. The system comprises the following modules: a multi-task DNN for perceiving surroundings, vision predictive control and CILQR controllers for vehicle motion planning and adherence to driving commands (steering, acceleration, and braking), and a PI controller combined with the longitudinal CILQR algorithm for velocity tracking. These modules receive input data from a monocular camera and a few inexpensive radars and operate collaboratively to operate the automated vehicle. The DNN, vision predictive control, and lateral and longitudinal CILQR algorithms are run efficiently every 24.52, 15.56, and 0.58 and 0.65 ms, respectively. In our simulation, the end‐to‐end latency from the camera output to the lateral controller output ($T_{a \to b} \equiv T_{lat}$) is longer than the actuator latency ($T_{c\to d} \equiv T_{act}= 6.66$ ms).
  • Figure 2: Overview of proposed MTUNet architecture. The input RGB image of size 228 $\times$ 228 is fed into the model, which then performs lane line segmentation, ego vehicle's pose estimation, and traffic object detection at the same time. The backbone-seg-subnet is an UNet-based network; three variants of UNet (UNet$\_$2$\times$Lee21a, UNet$\_$1$\times$Nab20, and MResUNet Nab20) are compared in this work. The ReLU activation functions in pose and det subnets are not shown for simplicity.
  • Figure 3: Problematic scenario for the VPC algorithm. (a) An example DNN-output lane-line binary map at a given time in the egocentric view. (b) Aerial view of the fitted lane lines. Here, $o$ is the current position of the ego vehicle and $o_p$ is the look-ahead point, and $p_0$ and $p_1$ represent the corresponding lane points at the same $x$ coordinates as $o$ and $o_p$, respectively. $\kappa$ and $\rm \delta$ are the road curvature and steering angle of the ego vehicle, respectively. In this paper, the look-ahead distance $\overline {oo_p}$ = 10 m is used, which corresponds to a car speed of approximately 72 km/h Lee19.
  • Figure 4: Tracks A (left) and B (right) for dynamically evaluating proposed MTUNet and control models. The total length of Track A/B (Track 7/8 in Lee21a) was 2843/3919 m with lane width 4 m, and the maximum curvature was approximately 0.03/0.05 1/m, which was curvier than a typical road Fit94. The self-driving car drove in a counterclockwise direction, and the starting locations are marked by green filled circle symbols. A self-driving vehicle Li19 could not finish a lap on Track A using the direct perception approach Che15.
  • Figure 5: Example traffic object and lane-line detection results for the MTUNet$\_$1$\times$ network on CULane (first row), LLAMAS (second row), and TORCS (third row) images.
  • ...and 9 more figures