Table of Contents
Fetching ...

Category-Agnostic Neural Object Rigging

Guangzhao He, Chen Geng, Shangzhe Wu, Jiajun Wu

TL;DR

CANOR addresses the challenge of category-agnostic rigging for deformable 4D objects by learning a low-dimensional pose space through a blob-based representation. An encoder maps a point cloud to pose-parameter blobs via a cross-attention codebook, separating pose $oldsymbol{B}_P$ from identity $oldsymbol{B}_I$, and a transformer decoder reconstructs the shape from edited blobs into a high-fidelity mesh using an occupancy field. The approach enables intuitive pose editing by directly manipulating blob positions $oldsymbol{x}$ and orientations $oldsymbol{r}$, while maintaining instance-specific details. Extensive experiments on DeformingThings4D, FaMoS, Shape2Motion, and a Fish dataset show state-of-the-art performance in IoU and Chamfer metrics, with qualitative results validating realistic dynamics and high-quality surfaces. Overall, CANOR provides a scalable, data-driven alternative to category-specific rigging, enabling practical 3D animation and modeling for diverse object categories without manual priors.

Abstract

The motion of deformable 4D objects lies in a low-dimensional manifold. To better capture the low dimensionality and enable better controllability, traditional methods have devised several heuristic-based methods, i.e., rigging, for manipulating dynamic objects in an intuitive fashion. However, such representations are not scalable due to the need for expert knowledge of specific categories. Instead, we study the automatic exploration of such low-dimensional structures in a purely data-driven manner. Specifically, we design a novel representation that encodes deformable 4D objects into a sparse set of spatially grounded blobs and an instance-aware feature volume to disentangle the pose and instance information of the 3D shape. With such a representation, we can manipulate the pose of 3D objects intuitively by modifying the parameters of the blobs, while preserving rich instance-specific information. We evaluate the proposed method on a variety of object categories and demonstrate the effectiveness of the proposed framework. Project page: https://guangzhaohe.com/canor

Category-Agnostic Neural Object Rigging

TL;DR

CANOR addresses the challenge of category-agnostic rigging for deformable 4D objects by learning a low-dimensional pose space through a blob-based representation. An encoder maps a point cloud to pose-parameter blobs via a cross-attention codebook, separating pose from identity , and a transformer decoder reconstructs the shape from edited blobs into a high-fidelity mesh using an occupancy field. The approach enables intuitive pose editing by directly manipulating blob positions and orientations , while maintaining instance-specific details. Extensive experiments on DeformingThings4D, FaMoS, Shape2Motion, and a Fish dataset show state-of-the-art performance in IoU and Chamfer metrics, with qualitative results validating realistic dynamics and high-quality surfaces. Overall, CANOR provides a scalable, data-driven alternative to category-specific rigging, enabling practical 3D animation and modeling for diverse object categories without manual priors.

Abstract

The motion of deformable 4D objects lies in a low-dimensional manifold. To better capture the low dimensionality and enable better controllability, traditional methods have devised several heuristic-based methods, i.e., rigging, for manipulating dynamic objects in an intuitive fashion. However, such representations are not scalable due to the need for expert knowledge of specific categories. Instead, we study the automatic exploration of such low-dimensional structures in a purely data-driven manner. Specifically, we design a novel representation that encodes deformable 4D objects into a sparse set of spatially grounded blobs and an instance-aware feature volume to disentangle the pose and instance information of the 3D shape. With such a representation, we can manipulate the pose of 3D objects intuitively by modifying the parameters of the blobs, while preserving rich instance-specific information. We evaluate the proposed method on a variety of object categories and demonstrate the effectiveness of the proposed framework. Project page: https://guangzhaohe.com/canor

Paper Structure

This paper contains 33 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: We introduce Category-Agnostic Neural Object Rigging (CANOR), a novel approach that learns to discover a low-dimensional pose space for dynamic objects. The representation is learned from animated 3D sequences of a deformable object category in an unsupervised fashion without relying on any category-specific expert knowledge. By decomposing each object's geometry into a sparse set of feature-embedded blobs, CANOR enables intuitive manipulation of object poses by editing the blobs. This representation captures interpretable motion structures for a diverse range of dynamic object categories.
  • Figure 2: Overview of our proposed pipeline. We use a set of feature-embeded blobs to represent the pose space of deformable objects (\ref{['method:symbols']}). The encoder takes a point cloud as input and maps it into blobs using a learnable codebook of query tokens that cross-attend with semantic point-wise features (\ref{['method:encoding']}). Once generated, these blobs can be edited by users to adjust the object's pose. The edited blobs are then voxelized into a feature volume and decoded back to a 3D volume using a transformer architecture (\ref{['method:decoding']}). Finally, the system query the decoded volume with sampled 3D coordinates to predict occupancy values, which are used to extract the edited mesh.
  • Figure 3: Difference in training and inference inputs. During training, we sample two point clouds of the same identity but with different poses to separately predict the identity-related blob parameters $\mathcal{B}_I$ and pose-related parameters $\mathcal{B}_P$. This setup enables $\mathcal{B}_P$ to simulate an edited pose resulting from user edits. During inference, both $\mathcal{B}_I$ and $\mathcal{B}_P$ are predicted from a single point cloud. The user can then explicitly edits $\mathcal{B}_P$ to represent the desired pose change.
  • Figure 4: Qualitative results. We show qualitative results for different rigging representations across four object categories. Our approach outperforms state-of-the-art methods on both modeling object motion and generating high-quality surface meshes.
  • Figure 5: Pose manipulation results for a novel category ("clay-monster") using our method, where no rigging tools are available.