Table of Contents
Fetching ...

Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation

Tomoki Mizuno, Kazuya Yabashi, Tsuyoshi Tasaki

TL;DR

This work addresses robustness gaps in pose estimation for simple-shaped objects in retail by introducing PYNet-NV, a neural network that jointly estimates an object's pose and an effective next viewpoint to guide a robot arm. By training a viewpoint-branch on top of PYNet and integrating it with a single movement-based display flow, the approach achieves a notable improvement in pose accuracy and downstream product display success compared to purely model-based NV methods or pose-only networks. The key contributions are the simultaneous pose-and-viewpoint estimation framework, empirical demonstration of a 7.4 percentage-point boost in pose accuracy, and an 84.2% product display success rate in robot experiments. This method has practical implications for automated retail display, enabling more reliable manipulation of simple-shaped products with minimal camera motion.

Abstract

We have developed a new method to estimate a Next Viewpoint (NV) which is effective for pose estimation of simple-shaped products for product display robots in retail stores. Pose estimation methods using Neural Networks (NN) based on an RGBD camera are highly accurate, but their accuracy significantly decreases when the camera acquires few texture and shape features at a current view point. However, it is difficult for previous mathematical model-based methods to estimate effective NV which is because the simple shaped objects have few shape features. Therefore, we focus on the relationship between the pose estimation and NV estimation. When the pose estimation is more accurate, the NV estimation is more accurate. Therefore, we develop a new pose estimation NN that estimates NV simultaneously. Experimental results showed that our NV estimation realized a pose estimation success rate 77.3\%, which was 7.4pt higher than the mathematical model-based NV calculation did. Moreover, we verified that the robot using our method displayed 84.2\% of products.

Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation

TL;DR

This work addresses robustness gaps in pose estimation for simple-shaped objects in retail by introducing PYNet-NV, a neural network that jointly estimates an object's pose and an effective next viewpoint to guide a robot arm. By training a viewpoint-branch on top of PYNet and integrating it with a single movement-based display flow, the approach achieves a notable improvement in pose accuracy and downstream product display success compared to purely model-based NV methods or pose-only networks. The key contributions are the simultaneous pose-and-viewpoint estimation framework, empirical demonstration of a 7.4 percentage-point boost in pose accuracy, and an 84.2% product display success rate in robot experiments. This method has practical implications for automated retail display, enabling more reliable manipulation of simple-shaped products with minimal camera motion.

Abstract

We have developed a new method to estimate a Next Viewpoint (NV) which is effective for pose estimation of simple-shaped products for product display robots in retail stores. Pose estimation methods using Neural Networks (NN) based on an RGBD camera are highly accurate, but their accuracy significantly decreases when the camera acquires few texture and shape features at a current view point. However, it is difficult for previous mathematical model-based methods to estimate effective NV which is because the simple shaped objects have few shape features. Therefore, we focus on the relationship between the pose estimation and NV estimation. When the pose estimation is more accurate, the NV estimation is more accurate. Therefore, we develop a new pose estimation NN that estimates NV simultaneously. Experimental results showed that our NV estimation realized a pose estimation success rate 77.3\%, which was 7.4pt higher than the mathematical model-based NV calculation did. Moreover, we verified that the robot using our method displayed 84.2\% of products.

Paper Structure

This paper contains 15 sections, 4 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: difficult example of pose estimation
  • Figure 2: poseclass of the simple-shaped product
  • Figure 3: pose estimation using poseclass
  • Figure 4: example of camera motion
  • Figure 5: system using PYNet estimating Next Viewpoint (PYNet-NV)
  • ...and 9 more figures