CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving
Bhargava Uppuluri, Anjel Patel, Neil Mehta, Sridhar Kamath, Pratyush Chakraborty
TL;DR
This work tackles safety and generalization challenges in autonomous driving by merging deep reinforcement learning with curriculum learning. The proposed CuRLA framework uses Proximal Policy Optimization (PPO) and a Variational Autoencoder (VAE) within the CARLA simulator, and introduces a two-fold curriculum along with a collision penalty to promote safe, efficient behavior. The authors show that curriculum learning accelerates training and yields higher average speeds compared to baselines, with two-fold curriculum outperforming one-fold variants. The improved reward shaping and staged learning contribute to more adaptable and reliable driving agents, with potential impact on real-world autonomous systems. Future directions include transformer-based perception and multi-objective reinforcement learning to further enhance performance and safety.
Abstract
In autonomous driving, traditional Computer Vision (CV) agents often struggle in unfamiliar situations due to biases in the training data. Deep Reinforcement Learning (DRL) agents address this by learning from experience and maximizing rewards, which helps them adapt to dynamic environments. However, ensuring their generalization remains challenging, especially with static training environments. Additionally, DRL models lack transparency, making it difficult to guarantee safety in all scenarios, particularly those not seen during training. To tackle these issues, we propose a method that combines DRL with Curriculum Learning for autonomous driving. Our approach uses a Proximal Policy Optimization (PPO) agent and a Variational Autoencoder (VAE) to learn safe driving in the CARLA simulator. The agent is trained using two-fold curriculum learning, progressively increasing environment difficulty and incorporating a collision penalty in the reward function to promote safety. This method improves the agent's adaptability and reliability in complex environments, and understand the nuances of balancing multiple reward components from different feedback signals in a single scalar reward function. Keywords: Computer Vision, Deep Reinforcement Learning, Variational Autoencoder, Proximal Policy Optimization, Curriculum Learning, Autonomous Driving.
