Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL
Osama Ahmad, Zawar Hussain, Hammad Naeem
TL;DR
The paper tackles trajectory planning for a 7-DOF robotic arm in dynamic, unknown environments with moving obstacles, aiming to complete a block-pick task within a fixed time. It employs a Deep Deterministic Policy Gradient (DDPG) actor–critic framework with experience replay and Polyak-updated target networks, comparing sparse and dense reward formulations to shape learning. The study demonstrates that sparse rewards provide faster convergence and higher success in obstacle-rich scenarios, while dense rewards can hinder training under complexity, highlighting a practical design choice for real-time DRL-based planning. The results, obtained on a MuJoCo/gymnasium-based simulation of a 7-DOF fetch arm, have implications for industrial robotics by informing reward shaping and potential future integration with Graph Neural Networks or MPC to improve adaptability in dynamic environments.
Abstract
This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.
