Table of Contents
Fetching ...

Object-Pose Estimation With Neural Population Codes

Heiko Hoffmann, Richard Hoffmann

TL;DR

This work addresses object pose estimation under symmetry-induced rotational ambiguity by introducing a neural population code for orientation. The approach encodes rotation as a population activation pattern on a sphere×circle, enabling direct end-to-end learning and robust handling of symmetry. On the T-LESS dataset, it achieves fast edge-device inference ($3.2$ ms) and superior symmetry-aware rotation accuracy (MSSD) of $84.7\%$ compared with baselines, while requiring grayscale input only. The population-code representation captures pose ambiguity through multiple activation peaks and demonstrates practical potential for real-time robotic assembly and potential extensions to direct grasp-posture control.

Abstract

Robotic assembly tasks require object-pose estimation, particularly for tasks that avoid costly mechanical constraints. Object symmetry complicates the direct mapping of sensory input to object rotation, as the rotation becomes ambiguous and lacks a unique training target. Some proposed solutions involve evaluating multiple pose hypotheses against the input or predicting a probability distribution, but these approaches suffer from significant computational overhead. Here, we show that representing object rotation with a neural population code overcomes these limitations, enabling a direct mapping to rotation and end-to-end learning. As a result, population codes facilitate fast and accurate pose estimation. On the T-LESS dataset, we achieve inference in 3.2 milliseconds on an Apple M1 CPU and a Maximum Symmetry-Aware Surface Distance accuracy of 84.7% using only gray-scale image input, compared to 69.7% accuracy when directly mapping to pose.

Object-Pose Estimation With Neural Population Codes

TL;DR

This work addresses object pose estimation under symmetry-induced rotational ambiguity by introducing a neural population code for orientation. The approach encodes rotation as a population activation pattern on a sphere×circle, enabling direct end-to-end learning and robust handling of symmetry. On the T-LESS dataset, it achieves fast edge-device inference ( ms) and superior symmetry-aware rotation accuracy (MSSD) of compared with baselines, while requiring grayscale input only. The population-code representation captures pose ambiguity through multiple activation peaks and demonstrates practical potential for real-time robotic assembly and potential extensions to direct grasp-posture control.

Abstract

Robotic assembly tasks require object-pose estimation, particularly for tasks that avoid costly mechanical constraints. Object symmetry complicates the direct mapping of sensory input to object rotation, as the rotation becomes ambiguous and lacks a unique training target. Some proposed solutions involve evaluating multiple pose hypotheses against the input or predicting a probability distribution, but these approaches suffer from significant computational overhead. Here, we show that representing object rotation with a neural population code overcomes these limitations, enabling a direct mapping to rotation and end-to-end learning. As a result, population codes facilitate fast and accurate pose estimation. On the T-LESS dataset, we achieve inference in 3.2 milliseconds on an Apple M1 CPU and a Maximum Symmetry-Aware Surface Distance accuracy of 84.7% using only gray-scale image input, compared to 69.7% accuracy when directly mapping to pose.

Paper Structure

This paper contains 13 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The population-code for symmetric objects has multiple peaks. The neural activation is color coded (yellow: high; blue: low). Here, showing only the Fibonacci sphere for the rotation axis.
  • Figure 2: Our network architecture comprises a sequence of 4 convolutional blocks and 4 linear layers
  • Figure 3: Random selection of training images for our synthetic datasets
  • Figure 4: Example of sensitivity of metrics to errors in the rotation estimate
  • Figure 5: Our network learns the activations of the population code