Table of Contents
Fetching ...

Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

Osama Ahmad, Zawar Hussain, Hammad Naeem

TL;DR

The paper tackles trajectory planning for a 7-DOF robotic arm in dynamic, unknown environments with moving obstacles, aiming to complete a block-pick task within a fixed time. It employs a Deep Deterministic Policy Gradient (DDPG) actor–critic framework with experience replay and Polyak-updated target networks, comparing sparse and dense reward formulations to shape learning. The study demonstrates that sparse rewards provide faster convergence and higher success in obstacle-rich scenarios, while dense rewards can hinder training under complexity, highlighting a practical design choice for real-time DRL-based planning. The results, obtained on a MuJoCo/gymnasium-based simulation of a 7-DOF fetch arm, have implications for industrial robotics by informing reward shaping and potential future integration with Graph Neural Networks or MPC to improve adaptability in dynamic environments.

Abstract

This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.

Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

TL;DR

The paper tackles trajectory planning for a 7-DOF robotic arm in dynamic, unknown environments with moving obstacles, aiming to complete a block-pick task within a fixed time. It employs a Deep Deterministic Policy Gradient (DDPG) actor–critic framework with experience replay and Polyak-updated target networks, comparing sparse and dense reward formulations to shape learning. The study demonstrates that sparse rewards provide faster convergence and higher success in obstacle-rich scenarios, while dense rewards can hinder training under complexity, highlighting a practical design choice for real-time DRL-based planning. The results, obtained on a MuJoCo/gymnasium-based simulation of a 7-DOF fetch arm, have implications for industrial robotics by informing reward shaping and potential future integration with Graph Neural Networks or MPC to improve adaptability in dynamic environments.

Abstract

This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.
Paper Structure (13 sections, 9 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 9 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Basic Diagram of Reinforcement Learning Scheme
  • Figure 2: Deep Deterministic Framework for Trajectory Planning for Robotic Arm
  • Figure 3: Robotic environment in OpenAI gym a) no obstacle b) obstacle
  • Figure 4: Success rate with no obstacle a) sparse reward b) dense reward
  • Figure 5: Actor loss when an obstacle is moving with a) sparse reward b) dense reward
  • ...and 2 more figures