Table of Contents
Fetching ...

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

Yuan Xu, Jiabing Yang, Xiaofeng Wang, Yixiang Chen, Zheng Zhu, Bowen Fang, Guan Huang, Xinze Chen, Yun Ye, Qiang Zhang, Peiyan Li, Xiangnan Wu, Kai Wang, Bing Zhan, Shuo Lu, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang

TL;DR

The paper tackles the brittleness of imitation policies to egocentric viewpoint shifts by introducing EgoDemoGen, a framework that generates paired demonstrations for novel viewpoints through action retargeting and EgoViewTransfer-based observation synthesis. Action retargeting ensures kinematically feasible motions in the new egocentric frame, while EgoViewTransfer, trained with a self-supervised double reprojection strategy, produces coherent, aligned egocentric videos. Empirical results in RoboTwin2.0 and real-world dual-arm setups show substantial improvements in policy success for both standard and novel viewpoints, with performance improving as more EgoDemoGen data are incorporated and diminishing returns beyond a 1:1 mixing ratio. The method offers a practical route toward viewpoint-robust manipulation in real-world robotics by enabling diverse, paired demonstrations without requiring ground-truth novel-view data.

Abstract

Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations by retargeting actions in the novel egocentric frame and synthesizing the corresponding egocentric observation videos with proposed generative video repair model **EgoViewTransfer**, which is conditioned by a novel-viewpoint reprojected scene video and a robot-only video rendered from the retargeted joint actions. EgoViewTransfer is finetuned from a pretrained video generation model using self-supervised double reprojection strategy. We evaluate EgoDemoGen on both simulation (RoboTwin2.0) and real-world robot. After training with a mixture of EgoDemoGen-generated novel egocentric demonstrations and original standard egocentric demonstrations, policy success rate improves **absolutely** by **+17.0%** for standard egocentric viewpoint and by **+17.7%** for novel egocentric viewpoints in simulation. On real-world robot, the **absolute** improvements are **+18.3%** and **+25.8%**. Moreover, performance continues to improve as the proportion of EgoDemoGen-generated demonstrations increases, with diminishing returns. These results demonstrate that EgoDemoGen provides a practical route to egocentric viewpoint-robust robotic manipulation.

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

TL;DR

The paper tackles the brittleness of imitation policies to egocentric viewpoint shifts by introducing EgoDemoGen, a framework that generates paired demonstrations for novel viewpoints through action retargeting and EgoViewTransfer-based observation synthesis. Action retargeting ensures kinematically feasible motions in the new egocentric frame, while EgoViewTransfer, trained with a self-supervised double reprojection strategy, produces coherent, aligned egocentric videos. Empirical results in RoboTwin2.0 and real-world dual-arm setups show substantial improvements in policy success for both standard and novel viewpoints, with performance improving as more EgoDemoGen data are incorporated and diminishing returns beyond a 1:1 mixing ratio. The method offers a practical route toward viewpoint-robust manipulation in real-world robotics by enabling diverse, paired demonstrations without requiring ground-truth novel-view data.

Abstract

Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations by retargeting actions in the novel egocentric frame and synthesizing the corresponding egocentric observation videos with proposed generative video repair model **EgoViewTransfer**, which is conditioned by a novel-viewpoint reprojected scene video and a robot-only video rendered from the retargeted joint actions. EgoViewTransfer is finetuned from a pretrained video generation model using self-supervised double reprojection strategy. We evaluate EgoDemoGen on both simulation (RoboTwin2.0) and real-world robot. After training with a mixture of EgoDemoGen-generated novel egocentric demonstrations and original standard egocentric demonstrations, policy success rate improves **absolutely** by **+17.0%** for standard egocentric viewpoint and by **+17.7%** for novel egocentric viewpoints in simulation. On real-world robot, the **absolute** improvements are **+18.3%** and **+25.8%**. Moreover, performance continues to improve as the proportion of EgoDemoGen-generated demonstrations increases, with diminishing returns. These results demonstrate that EgoDemoGen provides a practical route to egocentric viewpoint-robust robotic manipulation.

Paper Structure

This paper contains 44 sections, 6 equations, 12 figures, 5 tables, 3 algorithms.

Figures (12)

  • Figure 1: Illustration of viewpoint transformations. (a) Shifts in the egocentric viewpoint, including backward translation and clockwise/counterclockwise rotations. (b) Compared with a third-person view, robot base link and egocentric camera are mechanically coupled under egocentric view. A novel egocentric view requires action retargeting and observation synthesis consistent with the retargeted robotic arm state.
  • Figure 2: Overview of EgoDemoGen.(1) Egocentric View Transform: a Novel Egocentric View is specified by robot base motion $(\Delta x,\ \Delta y,\ \Delta \theta)$. (2) Action Retargeting: the original joint actions $Q$ is retargeted into the novel robot base frame to yield a kinematically feasible joint actions $\tilde{Q}$. (3) Novel Egocentric Observations: starting from the original observation video $V$, we mask the robot, reproject the scene to the novel viewpoint, perform hole filling, and apply EgoViewTransfer to synthesize the coherent observations $\tilde{V}$. (4) Novel Demonstrations & Policy Training: we obtain aligned pairs $(\tilde{V},\ \tilde{Q})$ for training egocentric viewpoint-robust policies.
  • Figure 3: EgoViewTransfer.(a) Double reprojection. It simulates artifacts and occlusions caused by viewpoint change. The double reprojected video are aligned with the original video to form input/label pairs for training. (b) Architecture of EgoViewTransfer. The model takes a degraded scene video and a robot video as conditions and generates egocentric observation videos consistent with dual inputs.
  • Figure 4: Simulation and Real-World Tasks with Egocentric View Shift.
  • Figure 5: Success Rates under varying data mixture ratios for the standard view and novel egocentric view. The dashed lines indicate the 1:0 baselines, and for real-world results the novel curve is the mean over Counterclockwise/Clockwise rotations.
  • ...and 7 more figures