Table of Contents
Fetching ...

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto

TL;DR

The paper addresses learning robot motion from demonstrations under flexible conditioning and modulation. It introduces DeepProbabilistic Movement Primitives (DeepProMPs) with a Bayesian Context Aggregator to unify via-point conditioning, context conditioning, blending, time modulation, and rhythmic movements within a single deep framework, trained via variational inference with an ELBO objective. Compared to ProMPs and CNMP-based baselines, DeepProMPs demonstrate improved handling of multimodal uncertainty and high-dimensional inputs (e.g., images) and enable deployment-time optimization to satisfy via-points. The approach advances neural motor primitives by providing a complete probabilistic mechanism that retains tractable operations from classical MPs while leveraging deep representations, with significant implications for robust, adaptable robotic manipulation.

Abstract

Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

TL;DR

The paper addresses learning robot motion from demonstrations under flexible conditioning and modulation. It introduces DeepProbabilistic Movement Primitives (DeepProMPs) with a Bayesian Context Aggregator to unify via-point conditioning, context conditioning, blending, time modulation, and rhythmic movements within a single deep framework, trained via variational inference with an ELBO objective. Compared to ProMPs and CNMP-based baselines, DeepProMPs demonstrate improved handling of multimodal uncertainty and high-dimensional inputs (e.g., images) and enable deployment-time optimization to satisfy via-points. The approach advances neural motor primitives by providing a complete probabilistic mechanism that retains tractable operations from classical MPs while leveraging deep representations, with significant implications for robust, adaptable robotic manipulation.

Abstract

Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
Paper Structure (11 sections, 11 equations, 7 figures)

This paper contains 11 sections, 11 equations, 7 figures.

Figures (7)

  • Figure 1: Barret WAM preparing to perform rhythmic motions encoded with our deep probabilistic movement primitives to shake a Mojito.
  • Figure 2: Results of the deployment-time via-point error minimization on one of the joints from our real-robot data. In green is the predicted distribution from our model; in blue is the distribution refined with gradient descent. Benefits are more pronounced with fewer via-points, while more via-points improve the overall quality of the prediction. The bottom-right plot shows the distribution of the training data.
  • Figure 3: DeepProMPs used in different situations. (a) Distribution of trajectories using $3$ via-points conditioning. (b) With more via-points, the variance of the distribution decreases. (c) With inconsistent via-points, the model chooses to violate one of them staying close to the given dataset and avoiding unseen behavior. Note the ability of our model to generate a bi-modal distribution. (d) Generalization can be enhanced using blending.
  • Figure 4: (a), (b), (c): Close box, pour water, and reach from RLBench. (d) A top view of our testing setup: The robot should grab the object and place in the designated square. Positions are encoded with 2D context variables.
  • Figure 5: Radar plots comparing reconstruction performance across different motor primitive models. Smaller circles are indicative of better performance across potential data types. We consider conditioning on images, low-dimensional context variables, the combination of both, the full trajectory as via points and the average across the four former types of inputs. Measurements are in log scale of the mean square error.
  • ...and 2 more figures