Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning
Dominik Urbaniak, Alejandro Agostini, Pol Ramon, Jan Rosell, Raúl Suárez, Michael Suppa
TL;DR
This work tackles fast, collision-free 3D trajectory generation for robotic manipulation under evolving obstacles by encoding a single artificial demonstration as a Dynamic Movement Primitive ($DMP$) and offline refining it with policy improvement using path integrals ($PI^2$). A neural network then maps automatically derived task parameters from a point cloud to $DMP$ parameters, enabling near-optimal online trajectories for unseen obstacle configurations. The main contributions are automatic point-cloud–driven task-parameterization for up to three continuous variables, offline $PI^2$ data generation to train the NN, and the ability to generate multi-modal avoidance solutions, validated against $RRT$-Connect and Linear baselines in simulation and on real hardware. The approach achieves online trajectory generation in about $0.2$ s, with offline training ranging from $2$ minutes to several hours, providing a practical, data-efficient pipeline for robust robotic manipulation in dynamic environments.
Abstract
Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.
