Table of Contents
Fetching ...

Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

Litian Liang, Liuyu Bian, Caiwei Xiao, Jialin Zhang, Linghao Chen, Isabella Liu, Fanbo Xiang, Zhiao Huang, Hao Su

TL;DR

Robo360 tackles real-world 3D robotic manipulation by providing dense omnispective data and rich modalities to learn high-quality 3D representations. The dataset enables both dynamic NeRF evaluation and multi-view policy learning through 86 synchronized cameras, teleoperation trajectories, and diverse material interactions. Baseline experiments reveal that current dynamic NeRF methods struggle with fast motion and view-generalization, while imitation-learning policies benefit from multi-view inputs, with arbitrary-view training improving generalization. Overall, Robo360 advances 3D scene understanding and robot control, enabling new research directions at the intersection of perception, physical world modeling, and manipulation.

Abstract

Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.

Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

TL;DR

Robo360 tackles real-world 3D robotic manipulation by providing dense omnispective data and rich modalities to learn high-quality 3D representations. The dataset enables both dynamic NeRF evaluation and multi-view policy learning through 86 synchronized cameras, teleoperation trajectories, and diverse material interactions. Baseline experiments reveal that current dynamic NeRF methods struggle with fast motion and view-generalization, while imitation-learning policies benefit from multi-view inputs, with arbitrary-view training improving generalization. Overall, Robo360 advances 3D scene understanding and robot control, enabling new research directions at the intersection of perception, physical world modeling, and manipulation.

Abstract

Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.
Paper Structure (34 sections, 11 figures, 5 tables)

This paper contains 34 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: An illustration of the comprehensive pipeline encompassing the stages of data collection, postprocessing, storage, and the subsequent downstream applications.
  • Figure 2: Robo360 captures real-world robot-object interactions with complex visual and material variations.
  • Figure 3: Qualitative results of dynamic NeRF methods. All the dynamic NeRF methods fail to accurately model objects with fast motion, such as falling bread and pouring sand, leading to a performance gap compared to static scenes.
  • Figure 4: Visualization of 5 example demos used in training networks via imitation learning. For each demo, we show 4 keyframes, where green arrows show the desired end-effector motion and green checkmarks denote the task is completed at the current time. Zoom in for details.
  • Figure 5: Temporal alignment error
  • ...and 6 more figures