Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

Haoyi Zhu; Yating Wang; Di Huang; Weicai Ye; Wanli Ouyang; Tong He

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

TL;DR

These findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions, and suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks.

Abstract

In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models. Codes are available at https://github.com/HaoyiZhu/PointCloudMatters.

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

TL;DR

Abstract

Paper Structure (46 sections, 7 figures, 21 tables)

This paper contains 46 sections, 7 figures, 21 tables.

Introduction
Background
OBSBench
Experiments
Evaluation Metrics
Study on performance of different observations with and without pre-training (Q1, Q2)
Study on Zero-Shot Generalization Capabilities Across Observation Spaces (Q3)
Zero-Shot Generalization to Camera View Changes
Zero-Shot Generalization to Visual Changes
Study on sample efficiency (Q4)
Study on design decisions on point cloud observation space (Q5)
Additional Ablations
Real-World Experiments
Related Work
Conclusion and Limitations
...and 31 more sections

Figures (7)

Figure 1: Overview of this work. We examine the impact of various observation spaces, specifically RGB, RGB-D, and point clouds, on robot learning. We develop OBSBench, a benchmark with standardized pipelines that include various encoders, PVRs, policies, simulators, evaluation settings, etc. Based on OBSBench, we conduct a series of empirical studies on observation spaces.
Figure 2: Point cloud has better zero-shot generalization ability on camera view and visual changes. We demonstrate the zero-shot generalization ability of different observation spaces. Encoders trained from scratch are shown in the first row, while PVRs are shown in the second row.
Figure 3: Lighting Conditions
Figure 4: Noise Levels
Figure 5: Background Colors
...and 2 more figures

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

TL;DR

Abstract

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)