Table of Contents
Fetching ...

Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image

Yuki Kawana, Tatsuya Harada

TL;DR

The paper presents a part-level, end-to-end approach for reconstructing multiple articulated objects from a single RGBD image. It introduces a detect-then-group pipeline with a Kinematics-aware Part Fusion (KPF) module and anisotropic scale normalization to handle diverse part configurations while maintaining manageable model size via a refiner that reallocates decoder layers. The method achieves state-of-the-art results in shape reconstruction and kinematic estimation on synthetic SAPIEN data and shows strong generalization to real BMVC data, highlighting the practicality of part-level representations for cross-category articulation. This work advances the ability to jointly infer part geometry, pose, and motion in cluttered scenes, enabling improved robotics and AR/VR interactions with daily objects.

Abstract

We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image, focusing on part-level shape reconstruction and pose and kinematics estimation. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Instead, we propose a novel alternative approach that employs part-level representation, representing instances as combinations of detected parts. While our detect-then-group approach effectively handles instances with diverse part structures and various part counts, it faces issues of false positives, varying part sizes and scales, and an increasing model size due to end-to-end training. To address these challenges, we propose 1) test-time kinematics-aware part fusion to improve detection performance while suppressing false positives, 2) anisotropic scale normalization for part shape learning to accommodate various part sizes and scales, and 3) a balancing strategy for cross-refinement between feature space and output space to improve part detection while maintaining model size. Evaluation on both synthetic and real data demonstrates that our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.

Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image

TL;DR

The paper presents a part-level, end-to-end approach for reconstructing multiple articulated objects from a single RGBD image. It introduces a detect-then-group pipeline with a Kinematics-aware Part Fusion (KPF) module and anisotropic scale normalization to handle diverse part configurations while maintaining manageable model size via a refiner that reallocates decoder layers. The method achieves state-of-the-art results in shape reconstruction and kinematic estimation on synthetic SAPIEN data and shows strong generalization to real BMVC data, highlighting the practicality of part-level representations for cross-category articulation. This work advances the ability to jointly infer part geometry, pose, and motion in cluttered scenes, enabling improved robotics and AR/VR interactions with daily objects.

Abstract

We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image, focusing on part-level shape reconstruction and pose and kinematics estimation. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Instead, we propose a novel alternative approach that employs part-level representation, representing instances as combinations of detected parts. While our detect-then-group approach effectively handles instances with diverse part structures and various part counts, it faces issues of false positives, varying part sizes and scales, and an increasing model size due to end-to-end training. To address these challenges, we propose 1) test-time kinematics-aware part fusion to improve detection performance while suppressing false positives, 2) anisotropic scale normalization for part shape learning to accommodate various part sizes and scales, and 3) a balancing strategy for cross-refinement between feature space and output space to improve part detection while maintaining model size. Evaluation on both synthetic and real data demonstrates that our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.

Paper Structure

This paper contains 72 sections, 4 equations, 18 figures, 16 tables, 2 algorithms.

Figures (18)

  • Figure 1: Our detection-based approach estimates part-level shape, pose, and kinematics as joint parameters. It also recovers parts-to-instance associations to handle multiple instances with various part structures and counts.
  • Figure 2: Overview of the pipeline. The input RGBD image is projected to a colored point cloud. The encoder $\mathcal{E}$ extracts scene features and a downsampled point cloud. The decoder $\mathcal{D}$ outputs a set of part proposals $\{X_n\}$ from part queryies $\{\mathbf{q}_n\}$. The refiner $\mathcal{R}$ estimates the residual of part pose and size $\Delta B$ and joint parameter $\Delta A$ for refined $\{B, A\}$. At test time, the inference is run $N_Q$ times independently to densely sample part proposals as $\{X_{n^{\prime}}\}$. KPF removes false positives in $\{X_{n^{\prime}}\}$ by using kinematics-aware IoU (kIoU) to refine the prediction further. The part shape is reconstructed by the implicit shape decoder $\mathcal{O}$.
  • Figure 3:
  • Figure 5: Qualitative results on SAPIEN Xiang_2020_SAPIEN dataset.
  • Figure 6: Qualitative results on the BMVC BMVC2015_181 dataset.
  • ...and 13 more figures