Table of Contents
Fetching ...

Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning

Dominik Urbaniak, Alejandro Agostini, Pol Ramon, Jan Rosell, Raúl Suárez, Michael Suppa

TL;DR

This work tackles fast, collision-free 3D trajectory generation for robotic manipulation under evolving obstacles by encoding a single artificial demonstration as a Dynamic Movement Primitive ($DMP$) and offline refining it with policy improvement using path integrals ($PI^2$). A neural network then maps automatically derived task parameters from a point cloud to $DMP$ parameters, enabling near-optimal online trajectories for unseen obstacle configurations. The main contributions are automatic point-cloud–driven task-parameterization for up to three continuous variables, offline $PI^2$ data generation to train the NN, and the ability to generate multi-modal avoidance solutions, validated against $RRT$-Connect and Linear baselines in simulation and on real hardware. The approach achieves online trajectory generation in about $0.2$ s, with offline training ranging from $2$ minutes to several hours, providing a practical, data-efficient pipeline for robust robotic manipulation in dynamic environments.

Abstract

Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.

Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning

TL;DR

This work tackles fast, collision-free 3D trajectory generation for robotic manipulation under evolving obstacles by encoding a single artificial demonstration as a Dynamic Movement Primitive () and offline refining it with policy improvement using path integrals (). A neural network then maps automatically derived task parameters from a point cloud to parameters, enabling near-optimal online trajectories for unseen obstacle configurations. The main contributions are automatic point-cloud–driven task-parameterization for up to three continuous variables, offline data generation to train the NN, and the ability to generate multi-modal avoidance solutions, validated against -Connect and Linear baselines in simulation and on real hardware. The approach achieves online trajectory generation in about s, with offline training ranging from minutes to several hours, providing a practical, data-efficient pipeline for robust robotic manipulation in dynamic environments.

Abstract

Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.

Paper Structure

This paper contains 18 sections, 15 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: System overview including steps 1) to 4) shown as colored blocks, and four example models and applications.
  • Figure 2: PI² cost function effects: Continuous $S_{shape}$ cost function, a) for one task parameter, and b-c) for three task parameters. $S_{scope}$ constrains the trajectory to the left in b) and within a gap in c). Also in c), task parameter $s_2$ is defined by a constant ratio $s_1/s_2$.
  • Figure 3: 2D avoidance example using the 3P-2D model with different obstacles (blue): a) near-optimal, smooth trajectories for random start and goal initialization, b) bi-modal solutions, c) considering four different obstacle dimensions.
  • Figure 4: Pick-and-Drop experiment: a-b) task parameter derivation with 3P-2D, and c) experiment setup and comparison to 1P-2D and baselines.
  • Figure 5: Comparison of our 1P-2D and 3P-2D models to two baselines: Linear and RRT. a) Computation time for detection and planning and execution time. b) Detailed contributions to computation time for detection and planning. c) Variations of the trajectory lengths.
  • ...and 2 more figures