Table of Contents
Fetching ...

MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation

Jan Ole von Hartz, Lukas Schweizer, Joschka Boedecker, Abhinav Valada

Abstract

Generative robot policies such as Flow Matching offer flexible, multi-modal policy learning but are sample-inefficient. Although object-centric policies improve sample efficiency, it does not resolve this limitation. In this work, we propose Multi-Stream Generative Policy (MSG), an inference-time composition framework that trains multiple object-centric policies and combines them at inference to improve generalization and sample efficiency. MSG is model-agnostic and inference-only, hence widely applicable to various generative policies and training paradigms. We perform extensive experiments both in simulation and on a real robot, demonstrating that our approach learns high-quality generative policies from as few as five demonstrations, resulting in a 95% reduction in demonstrations, and improves policy performance by 89 percent compared to single-stream approaches. Furthermore, we present comprehensive ablation studies on various composition strategies and provide practical recommendations for deployment. Finally, MSG enables zero-shot object instance transfer. We make our code publicly available at https://msg.cs.uni-freiburg.de.

MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation

Abstract

Generative robot policies such as Flow Matching offer flexible, multi-modal policy learning but are sample-inefficient. Although object-centric policies improve sample efficiency, it does not resolve this limitation. In this work, we propose Multi-Stream Generative Policy (MSG), an inference-time composition framework that trains multiple object-centric policies and combines them at inference to improve generalization and sample efficiency. MSG is model-agnostic and inference-only, hence widely applicable to various generative policies and training paradigms. We perform extensive experiments both in simulation and on a real robot, demonstrating that our approach learns high-quality generative policies from as few as five demonstrations, resulting in a 95% reduction in demonstrations, and improves policy performance by 89 percent compared to single-stream approaches. Furthermore, we present comprehensive ablation studies on various composition strategies and provide practical recommendations for deployment. Finally, MSG enables zero-shot object instance transfer. We make our code publicly available at https://msg.cs.uni-freiburg.de.

Paper Structure

This paper contains 21 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Multi-Stream Generative Policy (MSG) learns high-quality policies from as few as five demonstrations. MSG learns multiple object-centric models that are composed at inference time, enabling sample-efficient generalization.
  • Figure 2: Multi-stream model for OpenMicrowave, plotting the three position dimensions over time. The local model in the initial end-effector frame has low variance at the beginning of the trajectory and becomes less informative over time. In contrast, the microwave's local model gets more informative as the end-effector approaches the microwave. Once the end-effector grasps the handle, the end-effector frame is dropped entirely as it no longer adds information. The combined model has high precision throughout the trajectory.
  • Figure 3: We compare (\ref{['fig:comb_ens']}) composing the fully integrated predictions of the local generative models with (\ref{['fig:comb_icss']}) iteratively composing their predicted vector fields.
  • Figure 4: Stream composition strategies. The green circles indicate the one-sigma interval of the zero-mean Gaussian prior. For unimodal targets, the ensemble approximates the product distribution well. For multimodal targets, it fails because it lacks a mechanism to guide both streams to the same mode. Flow Composition encourages convergence to a common mode, reinforced by MCMC. MCMC can underestimate the target variance or overshoot if the prior is unlikely under the target; a suitable prior ensures converging flows.
  • Figure 5: The RLBench tasks OpenDrawer, OpenMicrowave, ToiletSeatUp, TurnTap, StackWine, PlaceCups, PhoneOnBase, InsertOntoSquarePeg, StackBlocks. The real-world tasks PickAndPlace, PourDrink, SweepBlocks, OpenDrawer, and StoreInDrawer.
  • ...and 4 more figures