Table of Contents
Fetching ...

SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network

Yuxuan Wan, Kaichen Zhou, jinhong Chen, Hao Dong

TL;DR

The paper defines the Single-Step Assembly Error Correction Task to tackle errors that accumulate during robotic part assembly. It introduces the LEGO-ECA dataset to provide misassembly examples and poses for correction, and proposes SCANet, a transformer-based network that treats each assembled component as a query to detect and correct pose errors using a two-branch CNN backbone and a component pose correction module. Experiments demonstrate that SCANet can identify and fix misassembled components, notably improving assembly correctness when used to refine MEPNet outputs, with demonstrated generalization to unseen data. The work highlights a path toward robust, error-aware autonomous assembly; however, it remains in simulation, inviting future work on real-robot applications and broader sequential correction across steps.

Abstract

Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the Single-Step Assembly Error Correction Task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset could be found at https://scanet-iros2024.github.io/.

SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network

TL;DR

The paper defines the Single-Step Assembly Error Correction Task to tackle errors that accumulate during robotic part assembly. It introduces the LEGO-ECA dataset to provide misassembly examples and poses for correction, and proposes SCANet, a transformer-based network that treats each assembled component as a query to detect and correct pose errors using a two-branch CNN backbone and a component pose correction module. Experiments demonstrate that SCANet can identify and fix misassembled components, notably improving assembly correctness when used to refine MEPNet outputs, with demonstrated generalization to unseen data. The work highlights a path toward robust, error-aware autonomous assembly; however, it remains in simulation, inviting future work on real-robot applications and broader sequential correction across steps.

Abstract

Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the Single-Step Assembly Error Correction Task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset could be found at https://scanet-iros2024.github.io/.
Paper Structure (18 sections, 4 equations, 9 figures, 2 tables)

This paper contains 18 sections, 4 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: We observed that as the assembly process progresses, assembly errors accumulate, leading to larger discrepancies between the final assembly result and the assembly manual.
  • Figure 2: The LEGO-ECA dataset, designed specifically for the single-step assembly error correction task, is presented. (i) Exemplars illustrate assembly manual sequence diagram, instances of component assembly errors, error types, and correct poses. (ii) The construction process outlines how erroneous assembly examples were generated in the LEGO-ECA dataset.
  • Figure 3: LEGO-ECA Dataset Statistics. (a) Proportions of different types of incorrectly assembled components in single-step assembly. In single-step assembly, involving typically 2 to 5 components, positional errors are the most prevalent. Rotational errors frequently accompany positional misalignment and tend to occur concurrently with positional errors, resulting in fewer instances of isolated rotational errors. (b) Distribution of different types of incorrectly assembled components across the entire dataset. Positional errors are predominant, while rotational errors are the least frequent. Correct components and those with both positional and rotational errors roughly occupy similar proportions. (c) Statistics of the number of manuals with different numbers of steps. The majority of manuals have step counts ranging from 15 to 40.
  • Figure 4: SCANet consists of two modules. (i) a convolutional neural network backbone, comprising a fusion block, Hourglass model, and assembly difference extractor, which extracts differential features between manual images and assembly results; and (ii) an assembly correction module, the core of SCANet, consisting of three parts: a component pose encoder, transformer network, and component pose corrector, which outputs the final corrected component pose information.
  • Figure 5: The component pose encoder consists of three sub-encoders: a 3D voxel encoder, a 6D pose encoder, and a 2D image encoder. The 3D voxel encoder encodes the component's 3D voxel data into a one-dimensional vector, which is then combined with the output of the 6D pose encoder. This combined vector is then concatenated with the 2D image encoder's output, producing a component feature that integrates 6D pose, 2D image, and 3D shape information.
  • ...and 4 more figures