Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

Nihal Acharya Adde; Hanno Gottschalk; Andreas Ebert

Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

Nihal Acharya Adde, Hanno Gottschalk, Andreas Ebert

TL;DR

The paper addresses hyperparameter optimization for reinforcement learning–based autonomous driving in a high-fidelity simulation. It combines Latin Hypercube Sampling for initialization, Gaussian Process surrogates, and Efficient Global Optimization with EI, extended to parallel $q$EI, to maximize cumulative rewards of a PPO-based driving agent in a Unity3D simulator. Results show a notable ~4% gain over manually tuned and initial LHS configurations, with the surrogate model’s $R^2$ improving from $0.48$ to $0.69$, and a best hyperparameter set achieving a peak reward of $1193$. Sensitivity analysis indicates learning rate as the most influential parameter, and the work demonstrates the viability of GP-based Bayesian optimization for RL in autonomous driving, outlining future directions such as multi-objective optimization and codevelopment of architectures.

Abstract

This paper focuses on hyperparameter optimization for autonomous driving strategies based on Reinforcement Learning. We provide a detailed description of training the RL agent in a simulation environment. Subsequently, we employ Efficient Global Optimization algorithm that uses Gaussian Process fitting for hyperparameter optimization in RL. Before this optimization phase, Gaussian process interpolation is applied to fit the surrogate model, for which the hyperparameter set is generated using Latin hypercube sampling. To accelerate the evaluation, parallelization techniques are employed. Following the hyperparameter optimization procedure, a set of hyperparameters is identified, resulting in a noteworthy enhancement in overall driving performance. There is a substantial increase of 4\% when compared to existing manually tuned parameters and the hyperparameters discovered during the initialization process using Latin hypercube sampling. After the optimization, we analyze the obtained results thoroughly and conduct a sensitivity analysis to assess the robustness and generalization capabilities of the learned autonomous driving strategies. The findings from this study contribute to the advancement of Gaussian process based Bayesian optimization to optimize the hyperparameters for autonomous driving in RL, providing valuable insights for the development of efficient and reliable autonomous driving systems.

Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

TL;DR

EI, to maximize cumulative rewards of a PPO-based driving agent in a Unity3D simulator. Results show a notable ~4% gain over manually tuned and initial LHS configurations, with the surrogate model’s

improving from

, and a best hyperparameter set achieving a peak reward of

. Sensitivity analysis indicates learning rate as the most influential parameter, and the work demonstrates the viability of GP-based Bayesian optimization for RL in autonomous driving, outlining future directions such as multi-objective optimization and codevelopment of architectures.

Abstract

Paper Structure (25 sections, 5 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 5 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Training Strategy Using RL
Learning in Simulation
RL Algorithm and Network Architecture
Selection of Hyperparameters
Description of the chosen Hyperparameters
Input and Output Specification
Rewards
Model-Based Hyperparameter Optimization
Search Space Exploration - Latin Hypercube Sampling
Gaussian Process
EGO Optimization
Experiment
Setting Up the Black-box Function
...and 10 more sections

Figures (6)

Figure 1: Framework of reinforcement learning
Figure 2: Unity3D simulator training environment. Top left: Drone view of road with agent. Top right: Camera recordings. Bottom: Driving behavior based on trajectory points.
Figure 3: PPO neural network architecture
Figure 4: Optimizing hyperparameters through iterative refinement: Latin Hypercube Sampling initiates the search, while Efficient Global Optimizer maximizes cumulative rewards.
Figure 5: Reward Convergence plots. (a)Cumulative rewards achieved during the optimization process. In the EGO phase, rewards are refined iteratively by tuning hyperparameters based on knowledge gained from previously evaluated data. (b) Reward Convergence for all RL iterations. Here, the average progression of rewards during RL training is plotted for both the initial data generation phase and the EGO phase.
...and 1 more figures

Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

TL;DR

Abstract

Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)