Table of Contents
Fetching ...

Deep Non-rigid Structure-from-Motion Revisited: Canonicalization and Sequence Modeling

Hui Deng, Jiawei Shi, Zhen Qin, Yiran Zhong, Yuchao Dai

TL;DR

Non-rigid Structure-from-Motion (NRSfM) remains challenged by sequence-specific ambiguity when reconstructing $3$D deformation from 2D sequences. The authors revisit deep NRSfM with two core ideas: per-sequence canonicalization via a parameter-free General Procrustean Analysis (GPA) layer and a sequence modeling module that jointly encodes temporal structure and subspace constraints. The method integrates a single-frame predictor, a context layer enforcing self-expressive regularity, GPA-based canonical alignment, and a combined reprojection and nuclear-norm loss to supervise training. Empirical results on $Human3.6M$, $InterHand2.6M$, $3DPW$, and $CMU\ MOCAP$ show improved accuracy and robustness, underscoring the value of sequence-aware canonicalization and temporal modeling for deep NRSfM.

Abstract

Non-Rigid Structure-from-Motion (NRSfM) is a classic 3D vision problem, where a 2D sequence is taken as input to estimate the corresponding 3D sequence. Recently, the deep neural networks have greatly advanced the task of NRSfM. However, existing deep NRSfM methods still have limitations in handling the inherent sequence property and motion ambiguity associated with the NRSfM problem. In this paper, we revisit deep NRSfM from two perspectives to address the limitations of current deep NRSfM methods : (1) canonicalization and (2) sequence modeling. We propose an easy-to-implement per-sequence canonicalization method as opposed to the previous per-dataset canonicalization approaches. With this in mind, we propose a sequence modeling method that combines temporal information and subspace constraint. As a result, we have achieved a more optimal NRSfM reconstruction pipeline compared to previous efforts. The effectiveness of our method is verified by testing the sequence-to-sequence deep NRSfM pipeline with corresponding regularization modules on several commonly used datasets.

Deep Non-rigid Structure-from-Motion Revisited: Canonicalization and Sequence Modeling

TL;DR

Non-rigid Structure-from-Motion (NRSfM) remains challenged by sequence-specific ambiguity when reconstructing D deformation from 2D sequences. The authors revisit deep NRSfM with two core ideas: per-sequence canonicalization via a parameter-free General Procrustean Analysis (GPA) layer and a sequence modeling module that jointly encodes temporal structure and subspace constraints. The method integrates a single-frame predictor, a context layer enforcing self-expressive regularity, GPA-based canonical alignment, and a combined reprojection and nuclear-norm loss to supervise training. Empirical results on , , , and show improved accuracy and robustness, underscoring the value of sequence-aware canonicalization and temporal modeling for deep NRSfM.

Abstract

Non-Rigid Structure-from-Motion (NRSfM) is a classic 3D vision problem, where a 2D sequence is taken as input to estimate the corresponding 3D sequence. Recently, the deep neural networks have greatly advanced the task of NRSfM. However, existing deep NRSfM methods still have limitations in handling the inherent sequence property and motion ambiguity associated with the NRSfM problem. In this paper, we revisit deep NRSfM from two perspectives to address the limitations of current deep NRSfM methods : (1) canonicalization and (2) sequence modeling. We propose an easy-to-implement per-sequence canonicalization method as opposed to the previous per-dataset canonicalization approaches. With this in mind, we propose a sequence modeling method that combines temporal information and subspace constraint. As a result, we have achieved a more optimal NRSfM reconstruction pipeline compared to previous efforts. The effectiveness of our method is verified by testing the sequence-to-sequence deep NRSfM pipeline with corresponding regularization modules on several commonly used datasets.

Paper Structure

This paper contains 15 sections, 13 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of deep NRSfM pipeline with proposed shape sequence reconstruction and GPA layer. The whole pipeline consists of three parts: the single-frame shape/rotation predictor, the shape sequence reconstruction stage and general procrustean analysis layer. The General Procrustean Analysis layer is a parameter-free layer, which is not needed for inference when training is finished.
  • Figure 2: The left side of the figure shows the canonicalization method in novotny2019c3dpo, which performs random rotations for each frame, i.e., training the network over the entire dataset to determine the canonical coordinate. The right side shows our method to align for each sequence, explicitly helping the network to determine the canonical coordinate.
  • Figure 3: Fig. \ref{['fig:viz_dataset']} shows the visualization result on different methods and ablation on the Human3.6M dataset(first row) and InterHand2.6M (second row), more visualization are included in supplementary. Visualization in Fig. \ref{['fig:short_sequence']} shows that our method also works on dense data.

Theorems & Definitions (1)

  • Definition 1