Table of Contents
Fetching ...

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

Yu Qi, Yuanchen Ju, Tianming Wei, Chi Chu, Lawson L. S. Wong, Huazhe Xu

TL;DR

This work introduces 2BY2, a large-scale, daily pairwise object assembly dataset with 18 tasks and 517 object pairs, annotated for pose and symmetry. It proposes a two-step $SE(3)$ pose estimation framework that leverages two-scale Vector Neuron DGCNNs and a cross-object fusion mechanism to predict sequential poses, achieving state-of-the-art performance across all tasks and demonstrating strong real-world generalization with robot experiments. Key contributions include the dataset, the two-step equivariant-pose approach, and validated improvements over baselines in both simulation and real hardware. The results hold practical significance for generalizable robot manipulation in everyday environments, enabling more reliable 3D assembly planning and execution.

Abstract

3D assembly tasks, such as furniture assembly and component fitting, play a crucial role in daily life and represent essential capabilities for future home robots. Existing benchmarks and datasets predominantly focus on assembling geometric fragments or factory parts, which fall short in addressing the complexities of everyday object interactions and assemblies. To bridge this gap, we present 2BY2, a large-scale annotated dataset for daily pairwise objects assembly, covering 18 fine-grained tasks that reflect real-life scenarios, such as plugging into sockets, arranging flowers in vases, and inserting bread into toasters. 2BY2 dataset includes 1,034 instances and 517 pairwise objects with pose and symmetry annotations, requiring approaches that align geometric shapes while accounting for functional and spatial relationships between objects. Leveraging the 2BY2 dataset, we propose a two-step SE(3) pose estimation method with equivariant features for assembly constraints. Compared to previous shape assembly methods, our approach achieves state-of-the-art performance across all 18 tasks in the 2BY2 dataset. Additionally, robot experiments further validate the reliability and generalization ability of our method for complex 3D assembly tasks.

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

TL;DR

This work introduces 2BY2, a large-scale, daily pairwise object assembly dataset with 18 tasks and 517 object pairs, annotated for pose and symmetry. It proposes a two-step pose estimation framework that leverages two-scale Vector Neuron DGCNNs and a cross-object fusion mechanism to predict sequential poses, achieving state-of-the-art performance across all tasks and demonstrating strong real-world generalization with robot experiments. Key contributions include the dataset, the two-step equivariant-pose approach, and validated improvements over baselines in both simulation and real hardware. The results hold practical significance for generalizable robot manipulation in everyday environments, enabling more reliable 3D assembly planning and execution.

Abstract

3D assembly tasks, such as furniture assembly and component fitting, play a crucial role in daily life and represent essential capabilities for future home robots. Existing benchmarks and datasets predominantly focus on assembling geometric fragments or factory parts, which fall short in addressing the complexities of everyday object interactions and assemblies. To bridge this gap, we present 2BY2, a large-scale annotated dataset for daily pairwise objects assembly, covering 18 fine-grained tasks that reflect real-life scenarios, such as plugging into sockets, arranging flowers in vases, and inserting bread into toasters. 2BY2 dataset includes 1,034 instances and 517 pairwise objects with pose and symmetry annotations, requiring approaches that align geometric shapes while accounting for functional and spatial relationships between objects. Leveraging the 2BY2 dataset, we propose a two-step SE(3) pose estimation method with equivariant features for assembly constraints. Compared to previous shape assembly methods, our approach achieves state-of-the-art performance across all 18 tasks in the 2BY2 dataset. Additionally, robot experiments further validate the reliability and generalization ability of our method for complex 3D assembly tasks.

Paper Structure

This paper contains 40 sections, 7 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of the 2BY2 Dataset. We propose the first large-scale daily pairwise object assembly dataset 2BY2, which contains 1,034 instances and 517 pairwise objects with pose and symmetry annotations.
  • Figure 2: Chamfer Distance Between Training and Testing Set. We normalize point clouds and compute the Chamfer Distance. For each task we calculate the distance separately between point cloud of Object A and Object B in the training set and test set.
  • Figure 3: Task Diversity Visualization. The image shows selected objects from four different tasks: USB, Bottle, Letter, and Plug in Socket. On the left are the objects selected on training set, and on the right is the testing set. As seen in the legend, object geometry varies in both the training and testing set, with the testing set containing novel shapes not seen in the training set.
  • Figure 4: Our Two-Step Pairwise Network. We utilize two-scale VN DGCNN as our encoder to extract equivariant and invariant feature. We first predict the canonical pose of $\mathcal{O_B}$ and then predict the pose of $\mathcal{O_A}$ according to it.
  • Figure 5: Real Robot Setup. We conduct real-world robot experiments on Cup, Flower, Bread and Plug tasks.
  • ...and 4 more figures