Table of Contents
Fetching ...

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

Matteo Bastico, David Ryckelynck, Laurent Corté, Yannick Tillier, Etienne Decencière

TL;DR

This work targets robust evaluation and high-fidelity generation of 3D point clouds. It identifies weaknesses in Chamfer Distance-based metrics and introduces a barycenter alignment step, Density-Aware Chamfer Distance (DCD), and Surface Normal Concordance (SNC) to provide a more comprehensive, robust evaluation. It then presents the Diffusion Point Transformer (DiPT), a transformer-based diffusion model that operates directly on raw points via serialized patches and space-filling curves, achieving state-of-the-art quality on ShapeNet. Extensive experiments, including a human-perception study and cross-category evaluations, demonstrate that SNC complements MMD, while DiPT delivers superior quality and variability across categories, highlighting practical improvements for 3D synthesis and evaluation.

Abstract

As 3D point clouds become a cornerstone of modern technology, the need for sophisticated generative models and reliable evaluation metrics has grown exponentially. In this work, we first expose that some commonly used metrics for evaluating generated point clouds, particularly those based on Chamfer Distance (CD), lack robustness against defects and fail to capture geometric fidelity and local shape consistency when used as quality indicators. We further show that introducing samples alignment prior to distance calculation and replacing CD with Density-Aware Chamfer Distance (DCD) are simple yet essential steps to ensure the consistency and robustness of point cloud generative model evaluation metrics. While existing metrics primarily focus on directly comparing 3D Euclidean coordinates, we present a novel metric, named Surface Normal Concordance (SNC), which approximates surface similarity by comparing estimated point normals. This new metric, when combined with traditional ones, provides a more comprehensive evaluation of the quality of generated samples. Finally, leveraging recent advancements in transformer-based models for point cloud analysis, such as serialized patch attention , we propose a new architecture for generating high-fidelity 3D structures, the Diffusion Point Transformer. We perform extensive experiments and comparisons on the ShapeNet dataset, showing that our model outperforms previous solutions, particularly in terms of quality of generated point clouds, achieving new state-of-the-art. Code available at https://github.com/matteo-bastico/DiffusionPointTransformer.

Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

TL;DR

This work targets robust evaluation and high-fidelity generation of 3D point clouds. It identifies weaknesses in Chamfer Distance-based metrics and introduces a barycenter alignment step, Density-Aware Chamfer Distance (DCD), and Surface Normal Concordance (SNC) to provide a more comprehensive, robust evaluation. It then presents the Diffusion Point Transformer (DiPT), a transformer-based diffusion model that operates directly on raw points via serialized patches and space-filling curves, achieving state-of-the-art quality on ShapeNet. Extensive experiments, including a human-perception study and cross-category evaluations, demonstrate that SNC complements MMD, while DiPT delivers superior quality and variability across categories, highlighting practical improvements for 3D synthesis and evaluation.

Abstract

As 3D point clouds become a cornerstone of modern technology, the need for sophisticated generative models and reliable evaluation metrics has grown exponentially. In this work, we first expose that some commonly used metrics for evaluating generated point clouds, particularly those based on Chamfer Distance (CD), lack robustness against defects and fail to capture geometric fidelity and local shape consistency when used as quality indicators. We further show that introducing samples alignment prior to distance calculation and replacing CD with Density-Aware Chamfer Distance (DCD) are simple yet essential steps to ensure the consistency and robustness of point cloud generative model evaluation metrics. While existing metrics primarily focus on directly comparing 3D Euclidean coordinates, we present a novel metric, named Surface Normal Concordance (SNC), which approximates surface similarity by comparing estimated point normals. This new metric, when combined with traditional ones, provides a more comprehensive evaluation of the quality of generated samples. Finally, leveraging recent advancements in transformer-based models for point cloud analysis, such as serialized patch attention , we propose a new architecture for generating high-fidelity 3D structures, the Diffusion Point Transformer. We perform extensive experiments and comparisons on the ShapeNet dataset, showing that our model outperforms previous solutions, particularly in terms of quality of generated point clouds, achieving new state-of-the-art. Code available at https://github.com/matteo-bastico/DiffusionPointTransformer.

Paper Structure

This paper contains 15 sections, 8 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Response of several metrics to random noise and barycenter shift on generated samples. (Left) An example comparing a reference sample (blue) and its modified version (red) as noise and barycenter translations are added in proportion to its diameters. (Right) An overview of the robustness of some traditional metrics (MMD-CD, COV-EMD, JSD and 1-NNA-EMD) and some proposed metrics (SNC-EMD, MMD-DCD) for evaluating point cloud generation.
  • Figure 2: Closest references to a sample under different distance measures with alignment and in response to small shifts.
  • Figure 3: Comparison of 1-NNA, MMD, and COV computed with (red) and without (blue) barycenter alignment. Each metric is evaluated using both DCD and EMD for three levels of shifting.
  • Figure 4: Evolution of the normalized MMD with respect to noise added to the samples of $S_g$, comparing distance measures (CD, EMD, DCD) under different conditions: with or without barycenter alignment and using uniformly or randomly sampled points.
  • Figure 5: Proposed Diffusion Point Transformer (DiPT) for 3D point cloud generation. (Left) The model serializes the raw input and shuffles the serialization orders before processing it through $N$ DiPT blocks, each performing xCPE, Serialized Patch Attention, and a linear layer. Features are modulated and scaled based on the input condition, composed of the sample category and the diffusion noising timestamp. (Right) Example of Hilbert serialization, where each color represents a patch of maximum size $8$.
  • ...and 12 more figures