Table of Contents
Fetching ...

Multi-scale Latent Point Consistency Models for 3D Shape Generation

Bi'an Du, Wei Hu, Renjie Liao

TL;DR

This work proposes a novel Multi-scale Latent Point Consistency Model (MLPCM), which follows a latent diffusion framework and introduces hierarchical levels of latent representations, ranging from point-level to super-point levels, each corresponding to a different spatial resolution.

Abstract

Consistency Models (CMs) have significantly accelerated the sampling process in diffusion models, yielding impressive results in synthesizing high-resolution images. To explore and extend these advancements to point-cloud-based 3D shape generation, we propose a novel Multi-scale Latent Point Consistency Model (MLPCM). Our MLPCM follows a latent diffusion framework and introduces hierarchical levels of latent representations, ranging from point-level to super-point levels, each corresponding to a different spatial resolution. We design a multi-scale latent integration module along with 3D spatial attention to effectively denoise the point-level latent representations conditioned on those from multiple super-point levels. Additionally, we propose a latent consistency model, learned through consistency distillation, that compresses the prior into a one-step generator. This significantly improves sampling efficiency while preserving the performance of the original teacher model. Extensive experiments on standard benchmarks ShapeNet and ShapeNet-Vol demonstrate that MLPCM achieves a 100x speedup in the generation process, while surpassing state-of-the-art diffusion models in terms of both shape quality and diversity.

Multi-scale Latent Point Consistency Models for 3D Shape Generation

TL;DR

This work proposes a novel Multi-scale Latent Point Consistency Model (MLPCM), which follows a latent diffusion framework and introduces hierarchical levels of latent representations, ranging from point-level to super-point levels, each corresponding to a different spatial resolution.

Abstract

Consistency Models (CMs) have significantly accelerated the sampling process in diffusion models, yielding impressive results in synthesizing high-resolution images. To explore and extend these advancements to point-cloud-based 3D shape generation, we propose a novel Multi-scale Latent Point Consistency Model (MLPCM). Our MLPCM follows a latent diffusion framework and introduces hierarchical levels of latent representations, ranging from point-level to super-point levels, each corresponding to a different spatial resolution. We design a multi-scale latent integration module along with 3D spatial attention to effectively denoise the point-level latent representations conditioned on those from multiple super-point levels. Additionally, we propose a latent consistency model, learned through consistency distillation, that compresses the prior into a one-step generator. This significantly improves sampling efficiency while preserving the performance of the original teacher model. Extensive experiments on standard benchmarks ShapeNet and ShapeNet-Vol demonstrate that MLPCM achieves a 100x speedup in the generation process, while surpassing state-of-the-art diffusion models in terms of both shape quality and diversity.
Paper Structure (19 sections, 11 equations, 7 figures, 8 tables)

This paper contains 19 sections, 11 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: An overview of the proposed multi-scale latent diffusion model. Hierarchical VAEs encode the input shape into latent variables at multiple resolutions, including point-level and super-point-level. Compared to the latent point diffusion model that only denoises the point-level latent space, our proposed multi-scale latent integration module injects higher-level latent variables to guide refinement, thereby jointly modeling local geometry and global structure. The details of the diffusion prior are shown in Figure \ref{['fig:archite_detail']}.
  • Figure 2: An illustration of the network architecture of the latent diffusion prior. The multi-scale latent integration module injects higher-level latents to couple local geometry with global structure, while the set abstraction module downsamples and aggregates across scales. We denote the voxel grid size as $r$, and the hidden dimension as $D$.
  • Figure 3: We train class-specific ShapeNet models (airplane, car, chair) and generate 2,048-point unconditional samples with PointFlow-style global normalization. Our results better preserve part coherence such as seat–leg and wing–fuselage continuity, while prior methods often show disconnections, fused parts, or symmetry artifacts.
  • Figure 4: Qualitative results show high-quality and diverse 3D assets. Samples maintain realistic structure and overall topology while exploring fine-grained detail changes, yielding diversity with symmetry largely intact.
  • Figure 5: Comparison between generated shapes and retrieved nearest neighbors (NN) in the training dataset. The generated shapes closely match the ground-truth shapes in overall topology, yet differ in fine-grained details, demonstrating diversity.
  • ...and 2 more figures