Table of Contents
Fetching ...

$\mathbf{M^3A}$ Policy: Mutable Material Manipulation Augmentation Policy through Photometric Re-rendering

Jiayi Li, Yuxuan Hu, Haoran Geng, Xiangyu Chen, Chuhao Zhou, Ziteng Cui, Jianfei Yang

TL;DR

Mutable Material Manipulation Augmentation (M^3A) addresses material generalization in robotic manipulation by photometrically re-rendering a single real-world demonstration into multiple material variants. It uses Grounded-SAM2 for object masks, MiDaS for depth, CLIP+IP-Adapter for material exemplars, and a Stable Diffusion-based inpainting step to produce realistic material appearances while preserving motion trajectories, enabling large-scale multi-material demonstrations without new data collection. The authors introduce the M^3 benchmark on RoboVerse to evaluate cross-material generalization and zero-shot material transfer, and demonstrate substantial gains in real-world tasks (average 58.03% improvement) and robust zero-shot performance across unseen materials. This work provides a practical, scalable pathway toward material-agnostic robotic learning by decoupling appearance from manipulation behavior and leveraging diffusion-based imitation learning.

Abstract

Material generalization is essential for real-world robotic manipulation, where robots must interact with objects exhibiting diverse visual and physical properties. This challenge is particularly pronounced for objects made of glass, metal, or other materials whose transparent or reflective surfaces introduce severe out-of-distribution variations. Existing approaches either rely on simulated materials in simulators and perform sim-to-real transfer, which is hindered by substantial visual domain gaps, or depend on collecting extensive real-world demonstrations, which is costly, time-consuming, and still insufficient to cover various materials. To overcome these limitations, we resort to computational photography and introduce Mutable Material Manipulation Augmentation (M$^3$A), a unified framework that leverages the physical characteristics of materials as captured by light transport for photometric re-rendering. The core idea is simple yet powerful: given a single real-world demonstration, we photometrically re-render the scene to generate a diverse set of highly realistic demonstrations with different material properties. This augmentation effectively decouples task-specific manipulation skills from surface appearance, enabling policies to generalize across materials without additional data collection. To systematically evaluate this capability, we construct the first comprehensive multi-material manipulation benchmark spanning both simulation and real-world environments. Extensive experiments show that the M$^3$A policy significantly enhances cross-material generalization, improving the average success rate across three real-world tasks by 58.03\%, and demonstrating robust performance on previously unseen materials.

$\mathbf{M^3A}$ Policy: Mutable Material Manipulation Augmentation Policy through Photometric Re-rendering

TL;DR

Mutable Material Manipulation Augmentation (M^3A) addresses material generalization in robotic manipulation by photometrically re-rendering a single real-world demonstration into multiple material variants. It uses Grounded-SAM2 for object masks, MiDaS for depth, CLIP+IP-Adapter for material exemplars, and a Stable Diffusion-based inpainting step to produce realistic material appearances while preserving motion trajectories, enabling large-scale multi-material demonstrations without new data collection. The authors introduce the M^3 benchmark on RoboVerse to evaluate cross-material generalization and zero-shot material transfer, and demonstrate substantial gains in real-world tasks (average 58.03% improvement) and robust zero-shot performance across unseen materials. This work provides a practical, scalable pathway toward material-agnostic robotic learning by decoupling appearance from manipulation behavior and leveraging diffusion-based imitation learning.

Abstract

Material generalization is essential for real-world robotic manipulation, where robots must interact with objects exhibiting diverse visual and physical properties. This challenge is particularly pronounced for objects made of glass, metal, or other materials whose transparent or reflective surfaces introduce severe out-of-distribution variations. Existing approaches either rely on simulated materials in simulators and perform sim-to-real transfer, which is hindered by substantial visual domain gaps, or depend on collecting extensive real-world demonstrations, which is costly, time-consuming, and still insufficient to cover various materials. To overcome these limitations, we resort to computational photography and introduce Mutable Material Manipulation Augmentation (MA), a unified framework that leverages the physical characteristics of materials as captured by light transport for photometric re-rendering. The core idea is simple yet powerful: given a single real-world demonstration, we photometrically re-render the scene to generate a diverse set of highly realistic demonstrations with different material properties. This augmentation effectively decouples task-specific manipulation skills from surface appearance, enabling policies to generalize across materials without additional data collection. To systematically evaluate this capability, we construct the first comprehensive multi-material manipulation benchmark spanning both simulation and real-world environments. Extensive experiments show that the MA policy significantly enhances cross-material generalization, improving the average success rate across three real-world tasks by 58.03\%, and demonstrating robust performance on previously unseen materials.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The framework of M$^3$A policy. The framework consists of three stages: (1) demonstration collection, where visuomotor trajectories (videos and action sequences) are collected from simulation or real-world environments; (2) M$^3$A, which re-composes or replaces the material appearance of manipulated objects to introduce realistic visual diversity; and (3) imitation learning, where policies are trained on the augmented demonstrations to achieve improved generalization across materials and environments.
  • Figure 2: Material transfer results produced by M$^3$A in both simulation and the real world. The top row shows the original camera observations, while the bottom row presents the corresponding material-transferred outputs. The four examples illustrate: (1) red plastic to wood, (2) dark gray plastic to metal, (3) white plastic to glass, and (4) white plastic to gemstone.
  • Figure 3: Real-world experiment settings. The FR3 manipulates cubes with eleven different materials to finish three tasks: (1) Picking, (2) Picking & placing, (3) Long-horizon picking & placing.
  • Figure 4: Success rate of simulation tasks under varying DP training epochs.