Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control
Jonaid Shianifar, Michael Schukat, Karl Mason
TL;DR
The paper addresses optimizing hyperparameters for deep reinforcement learning in robotic arm control with seven degrees of freedom. It applies Tree-structured Parzen Estimator (TPE) to optimize SAC and PPO hyperparameters, showing substantial gains in success rates and learning efficiency. Specifically, TPE yields a 10.48 percentage point improvement for SAC and a 34.28 point improvement for PPO at 50K training episodes; PPO reaches 95% of the maximum reward 76% faster (roughly 40K fewer episodes), with SAC improving by about 80% faster. The work demonstrates the practical impact of advanced hyperparameter optimization on DRL for complex robotic tasks and suggests directions for extending TPE optimization to additional algorithms and tasks.
Abstract
In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms in complex robotic tasks.
