Point Cloud Models Improve Visual Robustness in Robotic Learners
Skand Peri, Iain Lee, Chanho Kim, Li Fuxin, Tucker Hermans, Stefan Lee
TL;DR
The paper addresses robustness of vision-based robotic control to visual perturbations and shows that XYZ-RGB point-cloud inputs enable stronger robustness and faster learning than RGB-D inputs. It introduces Point Cloud World Models (PCWM), a model-based RL framework that operates on partial point clouds with an RSSM latent world model and a Dreamer-like policy, achieving improved sample efficiency and robustness across manipulation tasks. Key findings include substantial robustness gains to viewpoint, FoV, and lighting changes, and faster adaptation when finetuning in perturbed environments, highlighting the practical value of 3D scene reasoning for robotic learners. Overall, PCWM demonstrates that leveraging explicit 3D geometric representations can meaningfully enhance performance and transfer in real-world robotic control.
Abstract
Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners. Project Webpage: https://pvskand.github.io/projects/PCWM
