Table of Contents
Fetching ...

Stability-Guided Exploration for Diverse Motion Generation

Eckart Cobo-Briesewitz, Tilman Burghoff, Denis Shcherba, Armand Jordana, Marc Toussaint

TL;DR

This work proposes a novel method capable of finding diverse long-horizon manipulations through black-box simulation by combining an RRT-style search with sampling-based MPC, together with a novel sampling scheme that guides the exploration toward stable configurations.

Abstract

Scaling up datasets is highly effective in improving the performance of deep learning models, including in the field of robot learning. However, data collection still proves to be a bottleneck. Approaches relying on collecting human demonstrations are labor-intensive and inherently limited: they tend to be narrow, task-specific, and fail to adequately explore the full space of feasible states. Synthetic data generation could remedy this, but current techniques mostly rely on local trajectory optimization and fail to find diverse solutions. In this work, we propose a novel method capable of finding diverse long-horizon manipulations through black-box simulation. We achieve this by combining an RRT-style search with sampling-based MPC, together with a novel sampling scheme that guides the exploration toward stable configurations. Specifically, we sample from a manifold of stable states while growing a search tree directly through simulation, without restricting the planner to purely stable motions. We demonstrate the method's ability to discover diverse manipulation strategies, including pushing, grasping, pivoting, throwing, and tool use, across different robot morphologies, without task-specific guidance.

Stability-Guided Exploration for Diverse Motion Generation

TL;DR

This work proposes a novel method capable of finding diverse long-horizon manipulations through black-box simulation by combining an RRT-style search with sampling-based MPC, together with a novel sampling scheme that guides the exploration toward stable configurations.

Abstract

Scaling up datasets is highly effective in improving the performance of deep learning models, including in the field of robot learning. However, data collection still proves to be a bottleneck. Approaches relying on collecting human demonstrations are labor-intensive and inherently limited: they tend to be narrow, task-specific, and fail to adequately explore the full space of feasible states. Synthetic data generation could remedy this, but current techniques mostly rely on local trajectory optimization and fail to find diverse solutions. In this work, we propose a novel method capable of finding diverse long-horizon manipulations through black-box simulation. We achieve this by combining an RRT-style search with sampling-based MPC, together with a novel sampling scheme that guides the exploration toward stable configurations. Specifically, we sample from a manifold of stable states while growing a search tree directly through simulation, without restricting the planner to purely stable motions. We demonstrate the method's ability to discover diverse manipulation strategies, including pushing, grasping, pivoting, throwing, and tool use, across different robot morphologies, without task-specific guidance.
Paper Structure (18 sections, 2 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematic of the paths found by our method. For each start node (red), we grow a tree guided by the manifold of stable states, but not constrained by it.
  • Figure 2: The four environments used in our experiments. For the first two scenes the blue balls represent robots containing a translational joint in 3D space. The complexity of (a) lies in the high likelihood of landing in local minima, as once the object (orange) falls off the ramp it is impossible to recover. Environment (b) tests how well the algorithm can find diverse paths using two possible robot contact points as well as having a state where the rotation of the object is highly relevant. In (c) the method is tested for the ability of finding paths containing tool use. And environment (d) test the method on a high dimensional space where two robots can cooperate with each other to manipulate an object.
  • Figure 3: Examples of trajectories generated by StaGE. The first trajectory shows diverse manipulations in the PandaHook environment, while the second one demonstrates tool use (pulling the cube with the hook). The third trajectory shows a transfer of the cube from the left panda to the right, by throwing and catching the cube multiple times.
  • Figure 4: Adjacency matrices for trajectories found between a set of 26 stable states for the SpheresRamp environment. Blue indicates no connections were found between two states, the count value on the right refers to the amount of diverse paths found for each pair of stable states.
  • Figure 5: Ablation study comparing different values for the $n$-best actions and $k$-nearest neighbors parameter on the SpheresCube environment for a fixed budged of $2,500$ expansions. We fixed the values for both parameters to $16$ in the rest of our experiments.