Pre-trained Transformer-Enabled Strategies with Human-Guided Fine-Tuning for End-to-end Navigation of Autonomous Vehicles
Dong Hu, Chao Huang, Jingda Wu, Hongbo Gao
TL;DR
This work addresses data efficiency and generalization challenges in end-to-end autonomous driving by integrating a Transformer-based actor into a behavior-cloning pre-training stage, followed by reinforcement learning fine-tuning guided by human supervision (RLHG). The PTA-RLHG framework combines global context modeling with human demonstrations, interventions, and reward feedback to accelerate learning and improve safety in two CARLA-based scenarios (continuous overtaking and unprotected left-turn). Empirical results demonstrate faster convergence, robust performance, higher test success, and reduced collisions compared with multiple baselines, with ablations confirming the value of attention mechanisms and human guidance. The approach offers a practical pathway to safer, more reliable end-to-end AD, while highlighting avenues for future work in multimodal fusion and Transformer-based pre-training.
Abstract
Autonomous driving (AD) technology, leveraging artificial intelligence, strives for vehicle automation. End-toend strategies, emerging to simplify traditional driving systems by integrating perception, decision-making, and control, offer new avenues for advanced driving functionalities. Despite their potential, current challenges include data efficiency, training complexities, and poor generalization. This study addresses these issues with a novel end-to-end AD training model, enhancing system adaptability and intelligence. The model incorporates a Transformer module into the policy network, undergoing initial behavior cloning (BC) pre-training for update gradients. Subsequently, fine-tuning through reinforcement learning with human guidance (RLHG) adapts the model to specific driving environments, aiming to surpass the performance limits of imitation learning (IL). The fine-tuning process involves human interactions, guiding the model to acquire more efficient and safer driving behaviors through supervision, intervention, demonstration, and reward feedback. Simulation results demonstrate that this framework accelerates learning, achieving precise control and significantly enhancing safety and reliability. Compared to other advanced baseline methods, the proposed approach excels in challenging AD tasks. The introduction of the Transformer module and human-guided fine-tuning provides valuable insights and methods for research and applications in the AD field.
