Table of Contents
Fetching ...

One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs

Jianren Wang, Kangni Liu, Dingkun Guo, Xian Zhou, Christopher G Atkeson

TL;DR

This paper tackles one-shot video imitation for dynamic and deformable object manipulation by introducing Parameterized Symbolic Abstraction Graphs (PSAGs), where objects are nodes and their relationships are edges parameterized by geometric and non-geometric attributes. Non-geometric attributes such as forces are grounded through simulation, enabling learning from a single video demonstration and generalization to novel objects. The method trains a digital twin using MLS-MPM and gradient-based trajectory optimization, then transfers to real robots via a hybrid motion-force controller. Empirical results on five deformable tasks demonstrate strong one-shot generalization and clear advantages over baselines, highlighting a path toward reduced tutorial data needs and less reliance on dense sensing during real-world manipulation.

Abstract

Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.

One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs

TL;DR

This paper tackles one-shot video imitation for dynamic and deformable object manipulation by introducing Parameterized Symbolic Abstraction Graphs (PSAGs), where objects are nodes and their relationships are edges parameterized by geometric and non-geometric attributes. Non-geometric attributes such as forces are grounded through simulation, enabling learning from a single video demonstration and generalization to novel objects. The method trains a digital twin using MLS-MPM and gradient-based trajectory optimization, then transfers to real robots via a hybrid motion-force controller. Empirical results on five deformable tasks demonstrate strong one-shot generalization and clear advantages over baselines, highlighting a path toward reduced tutorial data needs and less reliance on dense sensing during real-world manipulation.

Abstract

Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.
Paper Structure (19 sections, 2 equations, 3 figures, 1 table)

This paper contains 19 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of our pipeline for learning from videos. (a) Building Parameterized Symbolic Abstraction Graphs (PSAG): PSAGs are generated by instance segmentation, consistent video depth estimation, and object relationship calculation. (b) Learning to Simulate: Constructing digital twin to ground geometric constraints via trajectory optimization (c) PSAG-to-Real Transfer using hybrid motion-force control.
  • Figure 2: Experiment Settings: (a) Robot arm with an avocado holder and another arm with a knife and force sensor.(b) Robot arm and cups for the pouring task. (c) Rolling pin mounted on the robot arm for rolling dough. (d) Slicer affixed to the robot arm for slicing pizza. (e) Knife mounted on the robot arm for cutting vegetables. (f) Multi-camera system. (g) Cups, yogurt, Coke, and water for the pouring task. (h, i) Dough, play sand, and play dough for the rolling dough and slicing pizza experiments. (j) Cucumber, tomato, and banana for the cutting vegetables task.
  • Figure 3: For each task, we present the video demonstration (top) and the robot trajectories (bottom). Our proposed method allows the robot to perform challenging tasks such as cutting an avocado, cutting vegetables, pouring liquid, rolling dough, and slicing pizza from a single demonstration.