FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

Chenliang Zhou; Fangcheng Zhong; Param Hanji; Zhilin Guo; Kyle Fogarty; Alejandro Sztrajman; Hongyun Gao; Cengiz Oztireli

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao, Cengiz Oztireli

TL;DR

FrePolad addresses the challenge of generating high-quality, diverse point clouds with flexible cardinality and efficient runtimes. It fuses a variational autoencoder with a latent diffusion model as a prior and introduces a frequency-rectification mechanism based on spherical harmonics to preserve high-frequency details during training. A continuous normalizing flow decoder and a two-stage training regime enable generation of point clouds of arbitrary size by modeling a distribution of points over latent shapes. Empirical results on ShapeNet show state-of-the-art quality and diversity with favorable computational efficiency, and ablations confirm that both spectral rectification and latent diffusion contribute materially to performance.

Abstract

We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate FrePolad's state-of-the-art performance in terms of quality, diversity, and computational efficiency. Project page: https://chenliang-zhou.github.io/FrePolad/.

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

TL;DR

Abstract

Paper Structure (33 sections, 40 equations, 7 figures, 7 tables)

This paper contains 33 sections, 40 equations, 7 figures, 7 tables.

Introduction
Related works
Denoising diffusion probabilistic models
Point cloud generation
Frequency analysis in VAE
Background
Variational Autoencoder
Denoising Diffusion Probabilistic Model
Spherical Harmonics
Formulation
Frequency Extraction via Spherical Harmonics
Frequency-Rectified VAE
DDPM-Based Prior
FrePolad
Components
...and 18 more sections

Figures (7)

Figure 1: (a) FrePolad combines novel frequency rectification with a point cloud VAE and a DDPM-based prior to generate point clouds with superior quality, diversity, and flexibility in cardinality. Plots show on the right (b) training and (c) generation costs vs. final validation score measured by 1-NNA-CD ($\downarrow$), (d) learning curves for the first 20 hours of training, and (e) generation cost for synthesizing different numbers of points
Figure 2: FrePolad is architectured as a point cloud VAE, with an embedded latent DDPM to represent the latent distribution. (a) Two-stage training: in the first stage (blue), the VAE is optimized to maximize the FreELBO \ref{['eq:freelbo']} with a standard Gaussian prior; in the second stage (green), while fixing the VAE, the latent DDPM is trained to model the latent distribution. (b) Generation: conditioned on a shape latent sampled from the DDPM, the CNF decoder transforms a Gaussian noise input into a synthesized shape.
Figure 3: A point cloud before and after frequency rectification and its representative function in spherical and frequency domains. Frequency rectification shifts points to more complex, less smooth regions and increases the relative importance of higher-frequency features, where VAEs can give more attention during reconstruction. Note that the frequency rectified point cloud in the second row is only for visualization; our framework does not explicitly generate such a point cloud during training.
Figure 4: Generation with 2048 points for airplane, chair, and car classes. Samples generated by FrePolad have better fidelity and diversity.
Figure 5: FrePolad supports flexibility in the cardinality of generated point clouds.
...and 2 more figures

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

TL;DR

Abstract

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)