Table of Contents
Fetching ...

ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion

Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun

TL;DR

ShapeShifter addresses the challenge of generating high-detail 3D shape variations from a single exemplar by introducing a multiscale diffusion model operating on a sparse voxel grid of explicit 3D features (points, normals, colors). The method combines a per-level upsampler and diffusion models that can be trained in parallel, producing high-fidelity geometry with interactive inference speeds. It demonstrates superior geometric quality over single-exemplar baselines while enabling open/closed surfaces, editing, and texture augmentation, all within minutes of training. The approach offers a practical, resource-efficient path for exemplar-based 3D variation generation with strong potential for retargeting and artist-driven workflows.

Abstract

This paper proposes ShapeShifter, a new 3D generative model that learns to synthesize shape variations based on a single reference model. While generative methods for 3D objects have recently attracted much attention, current techniques often lack geometric details and/or require long training times and large resources. Our approach remedies these issues by combining sparse voxel grids and point, normal, and color sampling within a multiscale neural architecture that can be trained efficiently and in parallel. We show that our resulting variations better capture the fine details of their original input and can handle more general types of surfaces than previous SDF-based methods. Moreover, we offer interactive generation of 3D shape variants, allowing more human control in the design loop if needed.

ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion

TL;DR

ShapeShifter addresses the challenge of generating high-detail 3D shape variations from a single exemplar by introducing a multiscale diffusion model operating on a sparse voxel grid of explicit 3D features (points, normals, colors). The method combines a per-level upsampler and diffusion models that can be trained in parallel, producing high-fidelity geometry with interactive inference speeds. It demonstrates superior geometric quality over single-exemplar baselines while enabling open/closed surfaces, editing, and texture augmentation, all within minutes of training. The approach offers a practical, resource-efficient path for exemplar-based 3D variation generation with strong potential for retargeting and artist-driven workflows.

Abstract

This paper proposes ShapeShifter, a new 3D generative model that learns to synthesize shape variations based on a single reference model. While generative methods for 3D objects have recently attracted much attention, current techniques often lack geometric details and/or require long training times and large resources. Our approach remedies these issues by combining sparse voxel grids and point, normal, and color sampling within a multiscale neural architecture that can be trained efficiently and in parallel. We show that our resulting variations better capture the fine details of their original input and can handle more general types of surfaces than previous SDF-based methods. Moreover, we offer interactive generation of 3D shape variants, allowing more human control in the design loop if needed.

Paper Structure

This paper contains 38 sections, 4 equations, 14 figures, 4 tables, 2 algorithms.

Figures (14)

  • Figure 1: ShapeShifter. Given a 3D exemplar, we propose to train a hierarchical diffusion model to create variations preserving the geometric details and styles of the exemplar. By combining compact yet explicit 3D features (colored, oriented points) with a sparse voxel grid, we shorten training times from hours to minutes, while yielding significantly better geometric quality than prior work. The hierarchical point representation and fast inference times further enable intuitive interactive editing.
  • Figure 2: Geometric details. Our generation captures significantly more geometric details present in the exemplar mesh (leftmost). Prior work, Sin3DGen li2023patch and Sin3DM wu2024sindm, operates with plenoxels and neural radiance fields encoded in single-resolution triplane features, respectively, which lack the capability to sufficiently represent and supervise high-resolution geometric details. In contrast, our method employs a colored and oriented point set, providing precise geometric information.
  • Figure 3: Multiscale diffusion on sparse voxel grid. We start from noise ${\mathbf{\epsilon}}\!\sim\!\mathcal{C}\left( {\bm{0}},{\bm{I}} \right)$ at the coarsest level $l\!=\!1$, and obtain the 3D feature grid ${\bm{\mathsfit{G}}}^{l}$ through reverse diffusion. Each subsequent level uses the output of the previous level. Inactive voxels are first pruned, then upsampled with a level-specific upsampler $\mathcal{U}^{l}$. The upsampled grid $\tilde{{\bm{\mathsfit{G}}}}{\newline}^{l}$ is subsequently noised and passed through the diffusion model to obtain a clean version of the sparse feature grid ${\bm{\mathsfit{G}}}^{l}$. All levels are independent and can thus be trained in parallel.
  • Figure 4: Controlled generation. The span of the output can be trivially controlled by resizing the initial grid anisotropically.
  • Figure 5: Editing. Using sparse voxel grid allows users to intuitively apply more precise edits. Here, a user can copy and paste a selected part of a generated variation at an intermediate level to manually alter the variation.
  • ...and 9 more figures