Deep Probabilistic Movement Primitives with a Bayesian Aggregator
Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto
TL;DR
The paper addresses learning robot motion from demonstrations under flexible conditioning and modulation. It introduces DeepProbabilistic Movement Primitives (DeepProMPs) with a Bayesian Context Aggregator to unify via-point conditioning, context conditioning, blending, time modulation, and rhythmic movements within a single deep framework, trained via variational inference with an ELBO objective. Compared to ProMPs and CNMP-based baselines, DeepProMPs demonstrate improved handling of multimodal uncertainty and high-dimensional inputs (e.g., images) and enable deployment-time optimization to satisfy via-points. The approach advances neural motor primitives by providing a complete probabilistic mechanism that retains tractable operations from classical MPs while leveraging deep representations, with significant implications for robust, adaptable robotic manipulation.
Abstract
Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
