Table of Contents
Fetching ...

From RGB images to Dynamic Movement Primitives for planar tasks

Antonis Sidiropoulos, Zoe Doulgeri

TL;DR

The paper addresses the limitation of traditional Dynamic Movement Primitives (DMPs) in handling multiple motion patterns and scene-aware generalization by learning DMP parameters directly from RGB imagery. It introduces Resnet-DMP, a framework that leverages a ResNet backbone to map visual demonstrations to DMP weights and initial/target positions, using a reparameterized form $\bm{y}_{s} = \bm{W}'\bm{\phi}(s)$ to enhance training. The approach enables end-to-end inference of motion patterns for planar tasks, reducing reliance on explicit perception-derived targets and enabling robust generalization across unseen scenes. Experimental validation on two planar tasks—stem unveiling in a vineyard and cluttered-object grasping—demonstrates improved generalization and higher success rates compared to a state-of-the-art image-to-DMP method, highlighting the practical impact of integrating deep residual representations with DMPs. The work lays groundwork for RGB-based task planning in manipulation and suggests extending to Cartesian tasks with RGB-D input in future work.

Abstract

DMP have been extensively applied in various robotic tasks thanks to their generalization and robustness properties. However, the successful execution of a given task may necessitate the use of different motion patterns that take into account not only the initial and target position but also features relating to the overall structure and layout of the scene. To make DMP applicable to a wider range of tasks and further automate their use, we design a framework combining deep residual networks with DMP, that can encapsulate different motion patterns of a planar task, provided through human demonstrations on the RGB image plane. We can then automatically infer from new raw RGB visual input the appropriate DMP parameters, i.e. the weights that determine the motion pattern and the initial/target positions. We compare our method against another SoA method for inferring DMP from images and carry out experimental validations in two different planar tasks.

From RGB images to Dynamic Movement Primitives for planar tasks

TL;DR

The paper addresses the limitation of traditional Dynamic Movement Primitives (DMPs) in handling multiple motion patterns and scene-aware generalization by learning DMP parameters directly from RGB imagery. It introduces Resnet-DMP, a framework that leverages a ResNet backbone to map visual demonstrations to DMP weights and initial/target positions, using a reparameterized form to enhance training. The approach enables end-to-end inference of motion patterns for planar tasks, reducing reliance on explicit perception-derived targets and enabling robust generalization across unseen scenes. Experimental validation on two planar tasks—stem unveiling in a vineyard and cluttered-object grasping—demonstrates improved generalization and higher success rates compared to a state-of-the-art image-to-DMP method, highlighting the practical impact of integrating deep residual representations with DMPs. The work lays groundwork for RGB-based task planning in manipulation and suggests extending to Cartesian tasks with RGB-D input in future work.

Abstract

DMP have been extensively applied in various robotic tasks thanks to their generalization and robustness properties. However, the successful execution of a given task may necessitate the use of different motion patterns that take into account not only the initial and target position but also features relating to the overall structure and layout of the scene. To make DMP applicable to a wider range of tasks and further automate their use, we design a framework combining deep residual networks with DMP, that can encapsulate different motion patterns of a planar task, provided through human demonstrations on the RGB image plane. We can then automatically infer from new raw RGB visual input the appropriate DMP parameters, i.e. the weights that determine the motion pattern and the initial/target positions. We compare our method against another SoA method for inferring DMP from images and carry out experimental validations in two different planar tasks.
Paper Structure (14 sections, 2 equations, 17 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 2 equations, 17 figures, 1 table, 1 algorithm.

Figures (17)

  • Figure 1: Resnet-DMP architecture. Magenta arrows apply only during training.
  • Figure 2: Stem unveiling task setup.
  • Figure 3: Demonstration procedure. Top: scene side view. Bottom: In-hand camera view. The unveiling trajectory is drawn on the rgb image (shown with red on the bottom left image) and is then projected on the vine plane and executed by the robot. The unveiled stem is shown with magenta on the bottom right image.
  • Figure 4: Data augmentation. The unveiling trajectory is shown with red. The magenta arrow shows the augmentation of each image.
  • Figure 5: Resnet-DMP vs VIMEDNet training loss. For each model, the network weights from the epoch with the lowest dev-set loss are kept (indicated with the dashed magenta line).
  • ...and 12 more figures