KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

Ruida Zhang; Chenyangguang Zhang; Yan Di; Fabian Manhardt; Xingyu Liu; Federico Tombari; Xiangyang Ji

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji

TL;DR

KP-RED tackles joint 3D shape retrieval and deformation from noisy scans by leveraging category-consistent sparse keypoints to build a deformation-aware embedding space and guide a neural cage-based deformation of retrieved CAD models. A local-global keypoint embedding paired with self-attention enables robust retrieval, while influence vectors on a cage controlled by keypoints drive fine-grained deformations, interpolated via mean value coordinates. The approach achieves state-of-the-art results on PartNet and Scan2CAD with real-time inference and strong robustness to partial observations. Overall, KP-RED demonstrates that category-consistent keypoints can unify retrieval and deformation for high-fidelity CAD model reconstruction in real-world scenarios.

Abstract

In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models from a pre-processed database to tightly match the target. Unlike existing dense matching based methods that typically struggle with noisy partial scans, we propose to leverage category-consistent sparse keypoints to naturally handle both full and partial object scans. Specifically, we first employ a lightweight retrieval module to establish a keypoint-based embedding space, measuring the similarity among objects by dynamically aggregating deformation-aware local-global features around extracted keypoints. Objects that are close in the embedding space are considered similar in geometry. Then we introduce the neural cage-based deformation module that estimates the influence vector of each keypoint upon cage vertices inside its local support region to control the deformation of the retrieved shape. Extensive experiments on the synthetic dataset PartNet and the real-world dataset Scan2CAD demonstrate that KP-RED surpasses existing state-of-the-art approaches by a large margin. Codes and trained models are released on https://github.com/lolrudy/KP-RED.

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

TL;DR

Abstract

Paper Structure (11 sections, 7 equations, 9 figures, 3 tables)

This paper contains 11 sections, 7 equations, 9 figures, 3 tables.

Introduction
Related Works
KP-RED
Keypoint-Driven Deformation
Deformation-Aware Retrieval
Handling Partial Point Cloud
Experiments
Experiments on Full Shapes
Experiments on Partial Shapes
Ablation Studies
Conclusion

Figures (9)

Figure 1: Top Two Rows: Given the target point cloud, KP-RED first retrieves the most similar CAD model from the preprocessed database and deforms it to match the target using the keypoints for guidance. Bottom Two Rows: Given a scene scan, KP-RED reconstructs the CAD models of all objects and represents the scene by gathering the reconstructed models.
Figure 2: Overview of KP-RED. The target point cloud (R-A) is first canonicalized using the estimated pose obtained from an arbitrary pose estimator di2022gpvzhang2022sspzhang2022rbpdi2021so, following which the keypoint predictor (R-B) is employed to forecast the target keypoints (R-C). An encoder (R-D) predicts point-wise features and Local Feature Aggregation (LFA) is used to obtain the features of each keypoint region (R-E). The self-attention module (R-F) extracts the local retrieval token of each region (R-G), which is then compared with the tokens of the database models (R-H). The region tokens are supervised with an auxiliary reconstruction task during training. The most similar shape to the target is chosen as the source model (R-I). The source keypoints are then predicted by the shared keypoint predictor and the local features are extracted via LFA (D-A - D-E). The self-attention module (D-F) predicts the influence vectors (D-G) which demonstrate how the displacements of keypoints inflect the cage. Given the cage of the source shape (D-H), the deformed cage (D-I) is derived from the influence vectors. Finally, the deformed point cloud and mesh (D-K) are finally computed by the cage-based deformation (D-J).
Figure 3: The training procedure of the retrieval module. Given the keypoints $\mathbf{K}_x$ and the region tokens extracted from the shape $S_x$, the reconstruction network reconstructs the corresponding regions $R'_x$ of $S_x$. Meanwhile, the network reconstructs the regions of the deformed shape $R'_{x2y}$ from the region tokens and $\mathbf{K}_y$.
Figure 4: (a1): The support region $R^{(i)}$ of the specific keypoint $K^{(i)}$. (a2): The influence vectors $I_i$ of the specific keypoint $K^{(i)}$. The color indicates the influence weight of the keypoint towards each cage vertex (as in Eq. \ref{['eq:ctgt']}). (b): The training procedure of the keypoint predictor for partial shapes. We employ the keypoint predictor trained with full shapes for supervision.
Figure 5: The visualization of the learned retrieval tokens of database shapes via t-SNE van2008visualizing. Objects whose tokens are close in the embedding space are considered similar in geometry.
...and 4 more figures

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

TL;DR

Abstract

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)