Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation

Yuhang Hu; Boyuan Chen; Hod Lipson

Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation

Yuhang Hu, Boyuan Chen, Hod Lipson

TL;DR

This work demonstrates for the first time how a task-agnostic dynamic self-model can be learned using only a single first-person-view camera in a self-supervised manner, without any prior knowledge of robot morphology, kinematics, or task.

Abstract

The ability of robots to model their own dynamics is key to autonomous planning and learning, as well as for autonomous damage detection and recovery. Traditionally, dynamic models are pre-programmed or learned from external observations. Here, we demonstrate for the first time how a task-agnostic dynamic self-model can be learned using only a single first-person-view camera in a self-supervised manner, without any prior knowledge of robot morphology, kinematics, or task. Through experiments on a 12-DoF robot, we demonstrate the capabilities of the model in basic locomotion tasks using visual input. Notably, the robot can autonomously detect anomalies, such as damaged components, and adapt its behavior, showcasing resilience in dynamic environments. Furthermore, the model's generalizability was validated across robots with different configurations, emphasizing its potential as a universal tool for diverse robotic systems. The egocentric visual self-model proposed in our work paves the way for more autonomous, adaptable, and resilient robotic systems.

Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation

TL;DR

Abstract

Paper Structure (14 sections, 7 figures, 3 tables)

This paper contains 14 sections, 7 figures, 3 tables.

Introduction
Related works
Methodology
Data Representation
Data Collection and Augmentation
Architecture and usage of the model
Training Egocentric Visual Self-model
Experiments
Real-world Experiments and Robustness in Unseen Environments
Evaluating the Contribution of the Visual Encoder
Generalizability and Transfer Learning
Applicability to a Humanoid Robot
Autonomous anomaly identification and adaptation
Conclusion

Figures (7)

Figure 1: Egocentric Visual Self-Model. a) Training pipeline. Motor babbling generates random actions $A_{t}$ executed by the robot. The onboard camera captures sequential images processed through a visual encoder. Proposed future actions $A_{t+1}$ are encoded and concatenated with visual features to predict the next robot state $\hat{S}_{t+1}$. b) Deployment. For real-time control, multiple proposed actions $A{0...i}$ are input to the trained self-model along with current visual data. Based on these predictions, rewards $R_{0...i}$ are computed for each predicted state$S_{0...i}$. The robot then selects the action $A_{m}$associated with the highest reward $R_{m}$ for the next move.
Figure 2: Basic locomotion tasks in the real-world environment. We deployed an egocentric visual self-model on a legged robot. The robot only relies on image sequences from the front camera and action commands to achieve (A) moving forward, (B) turning right, (C) turning left, and even (D) moving backward. It learns the skill from the simulation to anticipate one hundred possible future states by perceiving the latest visual information and motor commands it will actuate.
Figure 3: Evaluate the egocentric visual self-model in the real world with four baselines (n=3).
Figure 4: Assessing egocentric visual self-models on unseen terrain. (A, B, C, D and E) We performed the same experiments on carpets with pieces of slippery paper and terrain with checkerboard textures. (C) The robot is moving forward, and its first-view pictures are shown in E. (D) The robot is turning right and its first-view pictures are shown in F. (G) The robot moves forward with an egocentric visual self-model on unseen terrain.
Figure 5: New robot configurations and quantitative evaluations in various terrains and locomotion tasks. Robot 0 is modified to create three new configurations, robot 1, robot 2, and robot 3, by altering leg-body connection orientations, resulting in significant kinematic changes due to the serial connection of leg motors, thus providing a diverse set of test cases for our visual encoder's generalizability. The lower plots show prediction errors for our method (OM), a non-visual method (NV), and the initial model (IM). Our method consistently outperforms the other methods, exhibiting the lowest errors across all terrains and tasks for the three new robots.
...and 2 more figures

Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation

TL;DR

Abstract

Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)