From RGB images to Dynamic Movement Primitives for planar tasks
Antonis Sidiropoulos, Zoe Doulgeri
TL;DR
The paper addresses the limitation of traditional Dynamic Movement Primitives (DMPs) in handling multiple motion patterns and scene-aware generalization by learning DMP parameters directly from RGB imagery. It introduces Resnet-DMP, a framework that leverages a ResNet backbone to map visual demonstrations to DMP weights and initial/target positions, using a reparameterized form $\bm{y}_{s} = \bm{W}'\bm{\phi}(s)$ to enhance training. The approach enables end-to-end inference of motion patterns for planar tasks, reducing reliance on explicit perception-derived targets and enabling robust generalization across unseen scenes. Experimental validation on two planar tasks—stem unveiling in a vineyard and cluttered-object grasping—demonstrates improved generalization and higher success rates compared to a state-of-the-art image-to-DMP method, highlighting the practical impact of integrating deep residual representations with DMPs. The work lays groundwork for RGB-based task planning in manipulation and suggests extending to Cartesian tasks with RGB-D input in future work.
Abstract
DMP have been extensively applied in various robotic tasks thanks to their generalization and robustness properties. However, the successful execution of a given task may necessitate the use of different motion patterns that take into account not only the initial and target position but also features relating to the overall structure and layout of the scene. To make DMP applicable to a wider range of tasks and further automate their use, we design a framework combining deep residual networks with DMP, that can encapsulate different motion patterns of a planar task, provided through human demonstrations on the RGB image plane. We can then automatically infer from new raw RGB visual input the appropriate DMP parameters, i.e. the weights that determine the motion pattern and the initial/target positions. We compare our method against another SoA method for inferring DMP from images and carry out experimental validations in two different planar tasks.
