Table of Contents
Fetching ...

Robust Visual Embodiment: How Robots Discover Their Bodies in Real Environments

Salim Rezvani, Ammar Jaleel Mahmood, Robin Chhabra

TL;DR

This work investigates the susceptibility of visual self-modeling to realistic image degradations and introduces a task-aware denoising framework coupled with semantic segmentation to preserve robot morphology. The approach integrates Wiener filtering for blur, median filtering for salt-and-pepper noise, and Non-Local Means denoising with IFT-SVM refinement, all applied within a segmentation-driven pipeline that feeds a Self-Modeling Engine based on the FFKSM paradigm. Across simulated and real-world (3D-printed) hardware, the method delivers near-baseline morphology reconstruction under noisy and cluttered conditions, while prior pipelines deteriorate significantly. The results highlight the necessity of background-aware denoising and robust robot isolation for deploying self-aware robots in unpredictable environments, enabling reliable morpho-dynamics understanding and damage recovery in the wild.

Abstract

Robots with internal visual self-models promise unprecedented adaptability, yet existing autonomous modeling pipelines remain fragile under realistic sensing conditions such as noisy imagery and cluttered backgrounds. This paper presents the first systematic study quantifying how visual degradations--including blur, salt-and-pepper noise, and Gaussian noise--affect robotic self-modeling. Through both simulation and physical experiments, we demonstrate their impact on morphology prediction, trajectory planning, and damage recovery in state-of-the-art pipelines. To overcome these challenges, we introduce a task-aware denoising framework that couples classical restoration with morphology-preserving constraints, ensuring retention of structural cues critical for self-modeling. In addition, we integrate semantic segmentation to robustly isolate robots from cluttered and colorful scenes. Extensive experiments show that our approach restores near-baseline performance across simulated and physical platforms, while existing pipelines degrade significantly. These contributions advance the robustness of visual self-modeling and establish practical foundations for deploying self-aware robots in unpredictable real-world environments.

Robust Visual Embodiment: How Robots Discover Their Bodies in Real Environments

TL;DR

This work investigates the susceptibility of visual self-modeling to realistic image degradations and introduces a task-aware denoising framework coupled with semantic segmentation to preserve robot morphology. The approach integrates Wiener filtering for blur, median filtering for salt-and-pepper noise, and Non-Local Means denoising with IFT-SVM refinement, all applied within a segmentation-driven pipeline that feeds a Self-Modeling Engine based on the FFKSM paradigm. Across simulated and real-world (3D-printed) hardware, the method delivers near-baseline morphology reconstruction under noisy and cluttered conditions, while prior pipelines deteriorate significantly. The results highlight the necessity of background-aware denoising and robust robot isolation for deploying self-aware robots in unpredictable environments, enabling reliable morpho-dynamics understanding and damage recovery in the wild.

Abstract

Robots with internal visual self-models promise unprecedented adaptability, yet existing autonomous modeling pipelines remain fragile under realistic sensing conditions such as noisy imagery and cluttered backgrounds. This paper presents the first systematic study quantifying how visual degradations--including blur, salt-and-pepper noise, and Gaussian noise--affect robotic self-modeling. Through both simulation and physical experiments, we demonstrate their impact on morphology prediction, trajectory planning, and damage recovery in state-of-the-art pipelines. To overcome these challenges, we introduce a task-aware denoising framework that couples classical restoration with morphology-preserving constraints, ensuring retention of structural cues critical for self-modeling. In addition, we integrate semantic segmentation to robustly isolate robots from cluttered and colorful scenes. Extensive experiments show that our approach restores near-baseline performance across simulated and physical platforms, while existing pipelines degrade significantly. These contributions advance the robustness of visual self-modeling and establish practical foundations for deploying self-aware robots in unpredictable real-world environments.

Paper Structure

This paper contains 30 sections, 19 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Self-modeling pipeline overview
  • Figure 2: Mechanical design of the printed robot highlighting its structural layout
  • Figure 3: Experimental imaging setup with the robot positioned in front of the camera
  • Figure 4: Semantic segmentation comparison on a colorful leaf background.
  • Figure 5: Semantic segmentation comparison with two black pigeons in the background.
  • ...and 6 more figures