Table of Contents
Fetching ...

Autonomous Robotic Assembly: From Part Singulation to Precise Assembly

Kei Ota, Devesh K. Jha, Siddarth Jain, Bill Yerazunis, Radu Corcodel, Yash Shukla, Antonia Bronars, Diego Romeres

TL;DR

This work addresses autonomous long-horizon assembly of a gearbox from parts presented in arbitrary configurations. It introduces a multi-modal system that fuses vision, GelSight tactile sensing, and force-torque feedback to perform part singulation, grasping, in-hand pose estimation, and high-precision insertion and meshing in a closed loop. The contributions include a benchmark-like assembly task, a hardware realization with integrated sensing and MuJoCo-based sim-to-real planning, and demonstrated end-to-end success of 225 varied trials with a 99.11% success rate, highlighting robustness and practical potential for flexible manufacturing. The paper also identifies failure modes and suggests future directions toward automatic failure recovery and interactive perception for unfamiliar parts.

Abstract

Imagine a robot that can assemble a functional product from the individual parts presented in any configuration to the robot. Designing such a robotic system is a complex problem which presents several open challenges. To bypass these challenges, the current generation of assembly systems is built with a lot of system integration effort to provide the structure and precision necessary for assembly. These systems are mostly responsible for part singulation, part kitting, and part detection, which is accomplished by intelligent system design. In this paper, we present autonomous assembly of a gear box with minimum requirements on structure. The assembly parts are randomly placed in a two-dimensional work environment for the robot. The proposed system makes use of several different manipulation skills such as sliding for grasping, in-hand manipulation, and insertion to assemble the gear box. All these tasks are run in a closed-loop fashion using vision, tactile, and Force-Torque (F/T) sensors. We perform extensive hardware experiments to show the robustness of the proposed methods as well as the overall system. See supplementary video at https://www.youtube.com/watch?v=cZ9M1DQ23OI.

Autonomous Robotic Assembly: From Part Singulation to Precise Assembly

TL;DR

This work addresses autonomous long-horizon assembly of a gearbox from parts presented in arbitrary configurations. It introduces a multi-modal system that fuses vision, GelSight tactile sensing, and force-torque feedback to perform part singulation, grasping, in-hand pose estimation, and high-precision insertion and meshing in a closed loop. The contributions include a benchmark-like assembly task, a hardware realization with integrated sensing and MuJoCo-based sim-to-real planning, and demonstrated end-to-end success of 225 varied trials with a 99.11% success rate, highlighting robustness and practical potential for flexible manufacturing. The paper also identifies failure modes and suggests future directions toward automatic failure recovery and interactive perception for unfamiliar parts.

Abstract

Imagine a robot that can assemble a functional product from the individual parts presented in any configuration to the robot. Designing such a robotic system is a complex problem which presents several open challenges. To bypass these challenges, the current generation of assembly systems is built with a lot of system integration effort to provide the structure and precision necessary for assembly. These systems are mostly responsible for part singulation, part kitting, and part detection, which is accomplished by intelligent system design. In this paper, we present autonomous assembly of a gear box with minimum requirements on structure. The assembly parts are randomly placed in a two-dimensional work environment for the robot. The proposed system makes use of several different manipulation skills such as sliding for grasping, in-hand manipulation, and insertion to assemble the gear box. All these tasks are run in a closed-loop fashion using vision, tactile, and Force-Torque (F/T) sensors. We perform extensive hardware experiments to show the robustness of the proposed methods as well as the overall system. See supplementary video at https://www.youtube.com/watch?v=cZ9M1DQ23OI.
Paper Structure (19 sections, 4 equations, 10 figures, 3 tables)

This paper contains 19 sections, 4 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Autonomous Robotic Assembly: We present an autonomous robotic assembly system that can assemble a gear box from any given initial condition, as shown in the first column. The assembly system can reason about grasp feasibility and slide selected objects out of a clutter to create grasp affordances for the assembly parts. Then it performs the pose manipulation and grasping required for the downstream assembly task. Finally, using various different controllers, it performs the required insertion and meshing of gears to assemble a functioning gear box. The proposed system works in a closed-loop fashion, where it can deliberate on the success and failure of individual steps and react accordingly.
  • Figure 2: This figure shows the assembly parts with two identical pegs and two gears. Accurate dimensions of the parts are provided up to machining tolerances. The holes in the base plate are 15 mm in diameter and $70$ mm apart.
  • Figure 3: System level overview of the assembly controller. The color codes indicate the feedback modality for the particular related operation. Multiple color blocks for the same operation indicate that multiple sensors are used for feedback and/or controller design for the particular operation. [Best seen in color].
  • Figure 4: Singulation procedure. Given part pose from vision module (left), we first reconstruct the manipulation environment in the MuJoCo physics engine and generate an action using the random shooting method (middle). We then apply the action in the real system by sliding the target object using a suitable impedance controller with the F/T sensor (right).
  • Figure 5: For performing the peg manipulation, we design a grasp that allows in-hand rotation as the peg is grasped from the table-top. The grasp is defined as $g_p=g_p(l_p,f_p)$, where $l_p=(x_p, z_p)$ as shown in the figure.
  • ...and 5 more figures