Table of Contents
Fetching ...

SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes

Mohammad Zohaib, Luca Cosmo, Alessio Del Bue

TL;DR

SelfGeo tackles unsupervised 3D keypoint estimation on deformable shapes by learning persistent, semantically meaningful keypoints from unlabelled PCD sequences. It introduces two complementary loss families: a Shape loss with reconstruction, coverage, and surface terms, and a Deformation loss with geodesic-distance preservation and temporal smoothing, enabling keypoints to move with deformations while staying on the surface. The method uses a PointNet++ backbone to predict per-point keypoint distributions and reconstructs the shape, achieving superior performance on CAPE, ITOP, and Deforming Things 4D, even under noisy or downsampled data. The approach yields stable, interpretable keypoints suitable for skeleton-like representations in AR/VR and robotics, without requiring ground-truth annotations, though geodesic estimation noise and symmetry remain challenging areas for refinement.

Abstract

Unsupervised 3D keypoints estimation from Point Cloud Data (PCD) is a complex task, even more challenging when an object shape is deforming. As keypoints should be semantically and geometrically consistent across all the 3D frames - each keypoint should be anchored to a specific part of the deforming shape irrespective of intrinsic and extrinsic motion. This paper presents, "SelfGeo", a self-supervised method that computes persistent 3D keypoints of non-rigid objects from arbitrary PCDs without the need of human annotations. The gist of SelfGeo is to estimate keypoints between frames that respect invariant properties of deforming bodies. Our main contribution is to enforce that keypoints deform along with the shape while keeping constant geodesic distances among them. This principle is then propagated to the design of a set of losses which minimization let emerge repeatable keypoints in specific semantic locations of the non-rigid shape. We show experimentally that the use of geodesic has a clear advantage in challenging dynamic scenes and with different classes of deforming shapes (humans and animals). Code and data are available at: https://github.com/IIT-PAVIS/SelfGeo

SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes

TL;DR

SelfGeo tackles unsupervised 3D keypoint estimation on deformable shapes by learning persistent, semantically meaningful keypoints from unlabelled PCD sequences. It introduces two complementary loss families: a Shape loss with reconstruction, coverage, and surface terms, and a Deformation loss with geodesic-distance preservation and temporal smoothing, enabling keypoints to move with deformations while staying on the surface. The method uses a PointNet++ backbone to predict per-point keypoint distributions and reconstructs the shape, achieving superior performance on CAPE, ITOP, and Deforming Things 4D, even under noisy or downsampled data. The approach yields stable, interpretable keypoints suitable for skeleton-like representations in AR/VR and robotics, without requiring ground-truth annotations, though geodesic estimation noise and symmetry remain challenging areas for refinement.

Abstract

Unsupervised 3D keypoints estimation from Point Cloud Data (PCD) is a complex task, even more challenging when an object shape is deforming. As keypoints should be semantically and geometrically consistent across all the 3D frames - each keypoint should be anchored to a specific part of the deforming shape irrespective of intrinsic and extrinsic motion. This paper presents, "SelfGeo", a self-supervised method that computes persistent 3D keypoints of non-rigid objects from arbitrary PCDs without the need of human annotations. The gist of SelfGeo is to estimate keypoints between frames that respect invariant properties of deforming bodies. Our main contribution is to enforce that keypoints deform along with the shape while keeping constant geodesic distances among them. This principle is then propagated to the design of a set of losses which minimization let emerge repeatable keypoints in specific semantic locations of the non-rigid shape. We show experimentally that the use of geodesic has a clear advantage in challenging dynamic scenes and with different classes of deforming shapes (humans and animals). Code and data are available at: https://github.com/IIT-PAVIS/SelfGeo
Paper Structure (30 sections, 11 equations, 18 figures, 7 tables)

This paper contains 30 sections, 11 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Overview of the SelfGeo. The keypoints estimated for the two shapes of the same category are temporally consistent; they are anchored to their locations (equal geodesic distances) between two frames regardless of deformation, and maintain the semantic information (same colours indicate the corresponding keypoints). Moreover, the keypoints are estimated close to the surface and covering the whole shape.
  • Figure 2: Left: proposed SelfGeo. A sequence of PCD is input one by one to the Keypoints estimation network, which contains a PointNet++ encoder, a Conv1D and a Softmax layer. The network generates $K$ values for each point, indicating its probability to be one of the $K$ keypoints. The expected keypoint positions for each PCD are computed and passed to a decoder (DEC), which consists of 4 Con1D layers, to reconstruct the 3D shape. To improve the keypoints' inference, SelfGeo computes the shape loss (reconstruction, coverage and surface loss) using a single PCD, and deformation loss (right: geodesic distance and temporal smoothing loss) between two frames.
  • Figure 3: The higher PCK in (a) shows that the keypoints estimated by SelfGeo have better correspondences across frames, and (b) demonstrates their temporal consistency.
  • Figure 4: Performance on real humans. The ITOP dataset contains depth images including the background (top row). We segment humans and pass them to the SelfGeo, which remains successful in estimating consistent keypoints as shown in the bottom row.
  • Figure 5: Keypoints estimated on the Deforming Things 4D dataset. The first, third and fifth rows (from left to right) show animals performing different actions. Corresponding estimated keypoints on the input PCDs are illustrated in the second, fourth and sixth rows. The keypoints/PCDs are shown from the side view for a better visualization.
  • ...and 13 more figures