Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation
Matteo Bastico, David Ryckelynck, Laurent Corté, Yannick Tillier, Etienne Decencière
TL;DR
This work targets robust evaluation and high-fidelity generation of 3D point clouds. It identifies weaknesses in Chamfer Distance-based metrics and introduces a barycenter alignment step, Density-Aware Chamfer Distance (DCD), and Surface Normal Concordance (SNC) to provide a more comprehensive, robust evaluation. It then presents the Diffusion Point Transformer (DiPT), a transformer-based diffusion model that operates directly on raw points via serialized patches and space-filling curves, achieving state-of-the-art quality on ShapeNet. Extensive experiments, including a human-perception study and cross-category evaluations, demonstrate that SNC complements MMD, while DiPT delivers superior quality and variability across categories, highlighting practical improvements for 3D synthesis and evaluation.
Abstract
As 3D point clouds become a cornerstone of modern technology, the need for sophisticated generative models and reliable evaluation metrics has grown exponentially. In this work, we first expose that some commonly used metrics for evaluating generated point clouds, particularly those based on Chamfer Distance (CD), lack robustness against defects and fail to capture geometric fidelity and local shape consistency when used as quality indicators. We further show that introducing samples alignment prior to distance calculation and replacing CD with Density-Aware Chamfer Distance (DCD) are simple yet essential steps to ensure the consistency and robustness of point cloud generative model evaluation metrics. While existing metrics primarily focus on directly comparing 3D Euclidean coordinates, we present a novel metric, named Surface Normal Concordance (SNC), which approximates surface similarity by comparing estimated point normals. This new metric, when combined with traditional ones, provides a more comprehensive evaluation of the quality of generated samples. Finally, leveraging recent advancements in transformer-based models for point cloud analysis, such as serialized patch attention , we propose a new architecture for generating high-fidelity 3D structures, the Diffusion Point Transformer. We perform extensive experiments and comparisons on the ShapeNet dataset, showing that our model outperforms previous solutions, particularly in terms of quality of generated point clouds, achieving new state-of-the-art. Code available at https://github.com/matteo-bastico/DiffusionPointTransformer.
