Table of Contents
Fetching ...

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Aws Khalil, Jaerock Kwon

TL;DR

Perception latency degrades vision-based lateral control in autonomous vehicles. PLM-Net addresses this by coupling a Base Model (BM) with a Timed Action Prediction Model (TAPM) and using real-time latency $\delta$ to linearly interpolate among predicted future actions, enabling robustness to both constant and time-varying delays. In OSCAR-based simulations with a three-lane test track and 115k training samples, PLM-Net substantially improves steering accuracy (MAE, MSE, RMSE) and trajectory similarity compared with latency-affected BM alone, achieving up to 78–95% improvement across constant and time-varying latency scenarios. The approach demonstrates practical latency mitigation for vision-based AV control, with source code available for replication and extension.

Abstract

This study introduces the Perception Latency Mitigation Network (PLM-Net), a novel deep learning approach for addressing perception latency in vision-based Autonomous Vehicle (AV) lateral control systems. Perception latency is the delay between capturing the environment through vision sensors (e.g., cameras) and applying an action (e.g., steering). This issue is understudied in both classical and neural-network-based control methods. Reducing this latency with powerful GPUs and FPGAs is possible but impractical for automotive platforms. PLM-Net comprises the Base Model (BM) and the Timed Action Prediction Model (TAPM). BM represents the original Lane Keeping Assist (LKA) system, while TAPM predicts future actions for different latency values. By integrating these models, PLM-Net mitigates perception latency. The final output is determined through linear interpolation of BM and TAPM outputs based on real-time latency. This design addresses both constant and varying latency, improving driving trajectories and steering control. Experimental results validate the efficacy of PLM-Net across various latency conditions. Source code: https://github.com/AwsKhalil/oscar/tree/devel-plm-net.

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

TL;DR

Perception latency degrades vision-based lateral control in autonomous vehicles. PLM-Net addresses this by coupling a Base Model (BM) with a Timed Action Prediction Model (TAPM) and using real-time latency to linearly interpolate among predicted future actions, enabling robustness to both constant and time-varying delays. In OSCAR-based simulations with a three-lane test track and 115k training samples, PLM-Net substantially improves steering accuracy (MAE, MSE, RMSE) and trajectory similarity compared with latency-affected BM alone, achieving up to 78–95% improvement across constant and time-varying latency scenarios. The approach demonstrates practical latency mitigation for vision-based AV control, with source code available for replication and extension.

Abstract

This study introduces the Perception Latency Mitigation Network (PLM-Net), a novel deep learning approach for addressing perception latency in vision-based Autonomous Vehicle (AV) lateral control systems. Perception latency is the delay between capturing the environment through vision sensors (e.g., cameras) and applying an action (e.g., steering). This issue is understudied in both classical and neural-network-based control methods. Reducing this latency with powerful GPUs and FPGAs is possible but impractical for automotive platforms. PLM-Net comprises the Base Model (BM) and the Timed Action Prediction Model (TAPM). BM represents the original Lane Keeping Assist (LKA) system, while TAPM predicts future actions for different latency values. By integrating these models, PLM-Net mitigates perception latency. The final output is determined through linear interpolation of BM and TAPM outputs based on real-time latency. This design addresses both constant and varying latency, improving driving trajectories and steering control. Experimental results validate the efficacy of PLM-Net across various latency conditions. Source code: https://github.com/AwsKhalil/oscar/tree/devel-plm-net.
Paper Structure (32 sections, 9 equations, 24 figures, 12 tables, 1 algorithm)

This paper contains 32 sections, 9 equations, 24 figures, 12 tables, 1 algorithm.

Figures (24)

  • Figure 1: Perception Latency definition. When vehicle state is $x_t$ and we have observation $o_t$, the corresponding action $a_t$ is applied at time $t+\delta$ not $t$ ($o_t \rightarrow a_{t+\delta}$). By the time this action is applied, the vehicle state has changed and we have a new observation. The perception latency has two components: the algorithmic latency, and the actuator latency.
  • Figure 2: Illustration of Perception Latency Effect on Lateral Control During Lane Keeping. The green vehicle exhibits driving behavior with real-time observations, while the red vehicle demonstrates driving behavior with a delayed observation input. Both vehicles maintain a constant low speed. Motion starts at position $A$. Positions $A$ and $B$ depict both vehicles driving straight, staying within their lanes, with minimal steering adjustments ($a_{t_A} \approx a_{t_B} \approx 0.0$). At position $C$, where the first curve is encountered, the green vehicle successfully adjusts its trajectory by turning left to stay in the lane, while the red vehicle continues straight due to the delayed observation input. Subsequently, the red vehicle struggles to recover from its zigzag-shaped trajectory, highlighting the impact of incorrect actions on subsequent observations.
  • Figure 3: Overview of the Perception Latency Mitigation Network (PLM-Net). By leveraging a Timed Action Prediction Model (TAPM) alongside the Base Model (BM), PLM-Net enhances the system's ability to mitigate perception latency. The TAPM incorporates predictive modeling to anticipate future actions based on current visual observations. Additionally, the integration of multiple sub-models within the framework enables adaptation to varying latency levels, as the final action value $\widetilde{a}^{PLM}_t$ is determined through the function $f(\widetilde{a}_t,\delta_t)$ where it performs linear interpolation based on the real-time latency value $\delta_t$ given all the predictive action values provided by the TAPM ($\boldsymbol{\widetilde{a}}^{TAPM}_t$) and the current action value provided by the BM ($\widetilde{a}^{BM}_t$). See Section \ref{['sec:method']} for detailed explanation.
  • Figure 4: PLM-Net Diagram. This diagram illustrates the operation of the two models within the PLM-Net architecture in mitigating perception latency. The Base Model (BM), governed by policy $\pi^{BM}_\phi$, processes visual observations $o_t$ and vehicle speed $v_t$ to generate action $\widetilde{a}t$, as described in Eq. \ref{['eq:a_t-bm']}. Meanwhile, the Timed Action Prediction Model (TAPM), guided by policy $\pi^{TAPM}_\theta$, generates different predictive action values based on different latency values. Each action corresponding to a specific future state (e.g., $\widetilde{a}_{t+\delta1}$ for state $s_{t+\delta1}$), as detailed in Eq. \ref{['eq:a_t-tapm']}. The inputs to $\pi^{TAPM}_\theta$ include the output of $\pi^{BM}_\phi$ (i.e., $\widetilde{a}t$) along with image feature vector $\boldsymbol{z^o_t}$ and vehicle velocity vector $\boldsymbol{z^v_t}$. Subsequently, linear interpolation, as described in Algorithm \ref{['alg:linear_interpolation_latency']}, combines the outputs of $\pi^{BM}\phi$ and $\pi^{TAPM}_\theta$ based on the real-time perception latency $\delta$ to yield the final action of the PLM-Net.
  • Figure 5: PLM-Net Architecture. (a) The BM network design, inspired by the NVIDIA PilotNet structure bojarski2016end, processes visual observation $o_t$ and vehicle speed $v_t$ to predict steering angle $\widetilde{a}^{BM}_t$. It utilizes five convolutional layers and a multi-layer perceptron network with dropout layers, including fully-connected layers with neuron counts of 512, 100, 50, and 10. (b) The TAPM network design, inspired by ANECkhalil2023anec and BCILcodevilla2018ConditionalImitation, processes inputs forwarded by the BM ($\widetilde{a}_t$, $\boldsymbol{z^{o}_t}$, $\boldsymbol{z^{v}_t}$) through fully-connected layers with a 100, 500, and 100 neurons, respectively. These layers are then concatenated and forwarded to sub-models, including fully connected layers with neuron counts of 200, 100, and 50, and dropout layers with a rate of 0.3. Each submodel will provide distinct future action for a certain latency value. The outputs of the submodels forms the TAPM output $\boldsymbol{\widetilde{a}}^{TAPM}_t$, which is a set of predictive action values corresponding to distinct latency values.
  • ...and 19 more figures