Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies
Pingcheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen
TL;DR
The paper proposes Perception Stitching (PeS), a modular approach for zero-shot transfer of visuomotor policies across different visual configurations by reusing perception encoders. It introduces latent-space alignment via relative representations anchored to exemplar images and enforces disentanglement to stabilize cross-encoder transfer, achieving strong zero-shot performance in both simulation and real-worldrobot manipulation tasks. Key contributions include a practical two-encoder policy decomposition, anchor-based latent alignment, and comprehensive analyses (latent-space visuals and Grad-CAM) to elucidate why perceptual modularity improves transfer. The work enables plug-and-play reuse of perception modules, reducing data collection requirements for new camera setups and facilitating robust real-world deployment of visuomotor policies across diverse sensing configurations.
Abstract
Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our key idea is to enforce modularity of visual encoders by aligning the latent visual features among different visuomotor policies. Our method disentangles the perceptual knowledge with the downstream motion skills and allows the reuse of the visual encoders by directly stitching them to a policy network trained with partially different visual conditions. We evaluate our method in various simulated and real-world manipulation tasks. While baseline methods failed at all attempts, our method could achieve zero-shot success in real-world visuomotor tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.
