Table of Contents
Fetching ...

Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation

Alan Li, Angela P. Schoellig

TL;DR

The paper tackles robust 6D pose estimation in bin-picking by diagnosing where pose estimation errors concentrate in pose and occlusion spaces and then synthesizing targeted hard samples online. It introduces a dual error-modeling framework, deriving $P(\theta,\phi)$ for pose-space and $L(k)$ for occlusion-space, and uses these to generate realistic, difficult training cases within a bin environment. Through continuous online training, the method demonstrates up to 20% gains in correct detection rate on ROBI and notable improvements on bin-picking scenes in T-LESS, while remaining compatible with multiple pose estimators such as PVNet and GDRNPP. The approach enhances reliability and data efficiency for industrial 6D pose estimation, particularly under occlusion and symmetry challenges, with practical impact for robotic manipulation in cluttered environments.

Abstract

6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where objects may be textureless and in difficult poses, and occlusion between objects of the same type may cause confusion even in well-trained models. We propose a novel method of hard example synthesis that is model-agnostic, using existing simulators and the modeling of pose error in both the camera-to-object viewsphere and occlusion space. Through evaluation of the model performance with respect to the distribution of object poses and occlusions, we discover regions of high error and generate realistic training samples to specifically target these regions. With our training approach, we demonstrate an improvement in correct detection rate of up to 20% across several ROBI-dataset objects using state-of-the-art pose estimation models.

Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation

TL;DR

The paper tackles robust 6D pose estimation in bin-picking by diagnosing where pose estimation errors concentrate in pose and occlusion spaces and then synthesizing targeted hard samples online. It introduces a dual error-modeling framework, deriving for pose-space and for occlusion-space, and uses these to generate realistic, difficult training cases within a bin environment. Through continuous online training, the method demonstrates up to 20% gains in correct detection rate on ROBI and notable improvements on bin-picking scenes in T-LESS, while remaining compatible with multiple pose estimators such as PVNet and GDRNPP. The approach enhances reliability and data efficiency for industrial 6D pose estimation, particularly under occlusion and symmetry challenges, with practical impact for robotic manipulation in cluttered environments.

Abstract

6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where objects may be textureless and in difficult poses, and occlusion between objects of the same type may cause confusion even in well-trained models. We propose a novel method of hard example synthesis that is model-agnostic, using existing simulators and the modeling of pose error in both the camera-to-object viewsphere and occlusion space. Through evaluation of the model performance with respect to the distribution of object poses and occlusions, we discover regions of high error and generate realistic training samples to specifically target these regions. With our training approach, we demonstrate an improvement in correct detection rate of up to 20% across several ROBI-dataset objects using state-of-the-art pose estimation models.

Paper Structure

This paper contains 14 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The synthesis of hard samples using a set of pre-generated random bins. Poses are selected based on the error view-sphere, and realistic occlusions are added based on the occlusion model to generate the final training samples. After each training epoch, the model is evaluated again to update the error view-sphere and the occlusion model for the next round of hard sample synthesis.
  • Figure 2: Distribution of keypoint error across the view-sphere of Eye-bolt part. Each point on the sphere represents a single training sample where the camera is at a distance of approx. 500mm, and its viewpoint axis intersects with said point and the center of the object axis (left). The error distribution estimated from the training set error is shown on the right. Note that only half of the view-sphere is used due to symmetry of part.
  • Figure 3: An overview of the occlusion model generation, the goal of which is to estimate the expected error when a given point on the surface of the object is occluded. First a set of 5000 random points are sampled across the surface of the object, then pose estimator performance is evaluated for each training sample, the depth map is then used to determine the visibility of each of the sampled points, and the pose error is applied to points which should be visible based on the object pose but are occluded by other objects (blue).
  • Figure 4: Visualization of the estimated occlusion model for three parts, where redder areas represent higher expected error when occluded. Note that on the leftmost part, higher error is expected when the area around the slit is occluded, as the part is near-symmetric, and the visibility of the slit reduces the ambiguity of the pose.
  • Figure 5: Three examples of the hard case sample generation pipeline, starting with the original nearest neighboring failure case from the sampled point in the pose error distribution, to the pose sampled from the pre-generated bins, to the sampled occluding objects added and rendered into the scene. Note the realistic occlusions and object arrangements in both RGB and depth images achieved with our method.
  • ...and 4 more figures