Table of Contents
Fetching ...

Pre-trained Transformer-Enabled Strategies with Human-Guided Fine-Tuning for End-to-end Navigation of Autonomous Vehicles

Dong Hu, Chao Huang, Jingda Wu, Hongbo Gao

TL;DR

This work addresses data efficiency and generalization challenges in end-to-end autonomous driving by integrating a Transformer-based actor into a behavior-cloning pre-training stage, followed by reinforcement learning fine-tuning guided by human supervision (RLHG). The PTA-RLHG framework combines global context modeling with human demonstrations, interventions, and reward feedback to accelerate learning and improve safety in two CARLA-based scenarios (continuous overtaking and unprotected left-turn). Empirical results demonstrate faster convergence, robust performance, higher test success, and reduced collisions compared with multiple baselines, with ablations confirming the value of attention mechanisms and human guidance. The approach offers a practical pathway to safer, more reliable end-to-end AD, while highlighting avenues for future work in multimodal fusion and Transformer-based pre-training.

Abstract

Autonomous driving (AD) technology, leveraging artificial intelligence, strives for vehicle automation. End-toend strategies, emerging to simplify traditional driving systems by integrating perception, decision-making, and control, offer new avenues for advanced driving functionalities. Despite their potential, current challenges include data efficiency, training complexities, and poor generalization. This study addresses these issues with a novel end-to-end AD training model, enhancing system adaptability and intelligence. The model incorporates a Transformer module into the policy network, undergoing initial behavior cloning (BC) pre-training for update gradients. Subsequently, fine-tuning through reinforcement learning with human guidance (RLHG) adapts the model to specific driving environments, aiming to surpass the performance limits of imitation learning (IL). The fine-tuning process involves human interactions, guiding the model to acquire more efficient and safer driving behaviors through supervision, intervention, demonstration, and reward feedback. Simulation results demonstrate that this framework accelerates learning, achieving precise control and significantly enhancing safety and reliability. Compared to other advanced baseline methods, the proposed approach excels in challenging AD tasks. The introduction of the Transformer module and human-guided fine-tuning provides valuable insights and methods for research and applications in the AD field.

Pre-trained Transformer-Enabled Strategies with Human-Guided Fine-Tuning for End-to-end Navigation of Autonomous Vehicles

TL;DR

This work addresses data efficiency and generalization challenges in end-to-end autonomous driving by integrating a Transformer-based actor into a behavior-cloning pre-training stage, followed by reinforcement learning fine-tuning guided by human supervision (RLHG). The PTA-RLHG framework combines global context modeling with human demonstrations, interventions, and reward feedback to accelerate learning and improve safety in two CARLA-based scenarios (continuous overtaking and unprotected left-turn). Empirical results demonstrate faster convergence, robust performance, higher test success, and reduced collisions compared with multiple baselines, with ablations confirming the value of attention mechanisms and human guidance. The approach offers a practical pathway to safer, more reliable end-to-end AD, while highlighting avenues for future work in multimodal fusion and Transformer-based pre-training.

Abstract

Autonomous driving (AD) technology, leveraging artificial intelligence, strives for vehicle automation. End-toend strategies, emerging to simplify traditional driving systems by integrating perception, decision-making, and control, offer new avenues for advanced driving functionalities. Despite their potential, current challenges include data efficiency, training complexities, and poor generalization. This study addresses these issues with a novel end-to-end AD training model, enhancing system adaptability and intelligence. The model incorporates a Transformer module into the policy network, undergoing initial behavior cloning (BC) pre-training for update gradients. Subsequently, fine-tuning through reinforcement learning with human guidance (RLHG) adapts the model to specific driving environments, aiming to surpass the performance limits of imitation learning (IL). The fine-tuning process involves human interactions, guiding the model to acquire more efficient and safer driving behaviors through supervision, intervention, demonstration, and reward feedback. Simulation results demonstrate that this framework accelerates learning, achieving precise control and significantly enhancing safety and reliability. Compared to other advanced baseline methods, the proposed approach excels in challenging AD tasks. The introduction of the Transformer module and human-guided fine-tuning provides valuable insights and methods for research and applications in the AD field.
Paper Structure (33 sections, 24 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 24 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: The architecture of the proposed end-to-end strategy: (A) represents the PTA network is pre-trained through imitation learning; (B) represents RLHG for online fine-tuning; and (C) represents the interaction process between RL and the environment.
  • Figure 2: The driving scenarios in CARLA: (a) Continuous overtaking scenario on highway; (b) Forward camera image of continuous overtaking scenario; (c) Unprotected left-turn scenario in city; (d) Forward camera image of left-turn scenario.
  • Figure 3: The training process of our framework with baseline methods in overtake scenario: (a) Average reward; (b) Average driving distance; (c) Boxplot of rewards for the last 100 episodes; (d) Barplot of driving distances for the last 100 episodes.
  • Figure 4: The training process of our framework with baseline methods in left-turn scenario: (a) Average reward; (b) Average driving distance; (c) Boxplot of rewards for the last 100 episodes; (d) Barplot of driving distances for the last 100 episodes.
  • Figure 5: The training process of our framework with ablation baseline methods in two scenarios: (a) Average reward for overtaking scenario; (b) Average driving distance for overtaking scenario; (c) Average reward for left-turn scenario; (d) Average driving distance for left-turn scenario.
  • ...and 2 more figures