Table of Contents
Fetching ...

Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

Jonaid Shianifar, Michael Schukat, Karl Mason

TL;DR

The paper addresses optimizing hyperparameters for deep reinforcement learning in robotic arm control with seven degrees of freedom. It applies Tree-structured Parzen Estimator (TPE) to optimize SAC and PPO hyperparameters, showing substantial gains in success rates and learning efficiency. Specifically, TPE yields a 10.48 percentage point improvement for SAC and a 34.28 point improvement for PPO at 50K training episodes; PPO reaches 95% of the maximum reward 76% faster (roughly 40K fewer episodes), with SAC improving by about 80% faster. The work demonstrates the practical impact of advanced hyperparameter optimization on DRL for complex robotic tasks and suggests directions for extending TPE optimization to additional algorithms and tasks.

Abstract

In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms in complex robotic tasks.

Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

TL;DR

The paper addresses optimizing hyperparameters for deep reinforcement learning in robotic arm control with seven degrees of freedom. It applies Tree-structured Parzen Estimator (TPE) to optimize SAC and PPO hyperparameters, showing substantial gains in success rates and learning efficiency. Specifically, TPE yields a 10.48 percentage point improvement for SAC and a 34.28 point improvement for PPO at 50K training episodes; PPO reaches 95% of the maximum reward 76% faster (roughly 40K fewer episodes), with SAC improving by about 80% faster. The work demonstrates the practical impact of advanced hyperparameter optimization on DRL for complex robotic tasks and suggests directions for extending TPE optimization to additional algorithms and tasks.

Abstract

In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms in complex robotic tasks.
Paper Structure (17 sections, 3 equations, 6 figures, 5 tables)

This paper contains 17 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: In DRL, the agent selects actions based on its policy, interacts with the environment, and receives rewards to adjust its policy for future actions.
  • Figure 2: The iterative process of hyperparameter optimization and the DRL model evaluation.
  • Figure 3: (a) Franka Emika Panda arm robot with declared joints position, (b) panda_gym simulation environment
  • Figure 4: Parallel coordinate plot(PCP) for PPO (left) and SAC (right).
  • Figure 5: Hyperparameters importance plot, for PPO (left) and SAC (right).
  • ...and 1 more figures