Table of Contents
Fetching ...

Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics

Philipp Quentin, Dino Knoll, Daniel Goehring

TL;DR

This work evaluates a representative, end-to-end 6D pose estimation pipeline for automotive internal logistics, combining LabelFusion-based real-world data, NVISII-based synthetic data, and RGB-/RGB-D-based estimators (GDR-Net and DenseFusion) to assess feasibility for industrial deployment. The study finds that while data-generation pipelines scale well and trained estimators can reach near-optimal accuracy, they fall short on robustness due to unreliable uncertainty estimates, which can lead to crashes or missed placements in practice. RGB-based methods tend to be more robust than RGB-D methods when depth data is noisy or missing, but all approaches struggle with domain gaps introduced by synthetic data. The results highlight the need for reliable uncertainty quantification and depth-noise-aware training to move toward practical, scalable automation in automotive internal logistics. The work demonstrates the potential and current limits of translating advanced 6D pose estimation research into industry-ready robotic grasping pipelines, underscoring uncertainty modeling as the key area for future advancement.

Abstract

Despite the advances in robotics a large proportion of the of parts handling tasks in the automotive industry's internal logistics are not automated but still performed by humans. A key component to competitively automate these processes is a 6D pose estimation that can handle a large number of different parts, is adaptable to new parts with little manual effort, and is sufficiently accurate and robust with respect to industry requirements. In this context, the question arises as to the current status quo with respect to these measures. To address this we built a representative 6D pose estimation pipeline with state-of-the-art components from economically scalable real to synthetic data generation to pose estimators and evaluated it on automotive parts with regards to a realistic sequencing process. We found that using the data generation approaches, the performance of the trained 6D pose estimators are promising, but do not meet industry requirements. We reveal that the reason for this is the inability of the estimators to provide reliable uncertainties for their poses, rather than the ability of to provide sufficiently accurate poses. In this context we further analyzed how RGB- and RGB-D-based approaches compare against this background and show that they are differently vulnerable to the domain gap induced by synthetic data.

Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics

TL;DR

This work evaluates a representative, end-to-end 6D pose estimation pipeline for automotive internal logistics, combining LabelFusion-based real-world data, NVISII-based synthetic data, and RGB-/RGB-D-based estimators (GDR-Net and DenseFusion) to assess feasibility for industrial deployment. The study finds that while data-generation pipelines scale well and trained estimators can reach near-optimal accuracy, they fall short on robustness due to unreliable uncertainty estimates, which can lead to crashes or missed placements in practice. RGB-based methods tend to be more robust than RGB-D methods when depth data is noisy or missing, but all approaches struggle with domain gaps introduced by synthetic data. The results highlight the need for reliable uncertainty quantification and depth-noise-aware training to move toward practical, scalable automation in automotive internal logistics. The work demonstrates the potential and current limits of translating advanced 6D pose estimation research into industry-ready robotic grasping pipelines, underscoring uncertainty modeling as the key area for future advancement.

Abstract

Despite the advances in robotics a large proportion of the of parts handling tasks in the automotive industry's internal logistics are not automated but still performed by humans. A key component to competitively automate these processes is a 6D pose estimation that can handle a large number of different parts, is adaptable to new parts with little manual effort, and is sufficiently accurate and robust with respect to industry requirements. In this context, the question arises as to the current status quo with respect to these measures. To address this we built a representative 6D pose estimation pipeline with state-of-the-art components from economically scalable real to synthetic data generation to pose estimators and evaluated it on automotive parts with regards to a realistic sequencing process. We found that using the data generation approaches, the performance of the trained 6D pose estimators are promising, but do not meet industry requirements. We reveal that the reason for this is the inability of the estimators to provide reliable uncertainties for their poses, rather than the ability of to provide sufficiently accurate poses. In this context we further analyzed how RGB- and RGB-D-based approaches compare against this background and show that they are differently vulnerable to the domain gap induced by synthetic data.
Paper Structure (24 sections, 3 equations, 6 figures, 5 tables)

This paper contains 24 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Automotive antenna covers (left) and interior handles (right) in storage containers, that enforce structured and unstructured positions.
  • Figure 2: Representative robotic setup for a automotive sequencing process showing antennas on a conveyor belt in front of a UR10 with a two finger gripper and a Framos camera mounted on the end effector. Left to the conveyor belt is the target sequence container.
  • Figure 3: Exemplary rendered images showing from the left to right: First scene and second scene type antennas, first and second scene type handles.
  • Figure 4: Violine plots of the MDE on the test set for the different models on a logarithmic axis. The results for the antenna are shown in red and for the handle in blue. The white dots represent the location of the median and the black bars indicate the location of the lower and upper quartiles, enclosing 75 % of the data. The vertical dotted line shows the error threshold $\theta_{p}$ of 0.015 m. The x-axis is limited from 0.5 mm to 0.3 m.
  • Figure 5: Average Precision and Average Recall of DF over a confidence intervall $c \in [0,1)$ for the Antenna on the left and the Handle on the right.
  • ...and 1 more figures