DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
Yanlong Li, Chamara Madarasingha, Kanchana Thilakarathna
TL;DR
DiffPMAE tackles the challenge of high-volume, lossy 3D point-cloud transmission by uniting Masked Autoencoding with diffusion-based reconstruction in a self-supervised framework. The encoder splits a point cloud into visible and masked patches and supplies a latent condition to a diffusion-based decoder, enabling accurate reconstruction and adaptability to compression, upsampling, and completion. Across ShapeNet-55 and ModelNet40, DiffPMAE demonstrates strong autoencoding (e.g., MMD CD of 1.125×10^-3) and superior downstream performance, along with practical inference speed suitable for streaming. The work highlights the potential of SSL-guided diffusion for efficient 3D content and provides extensive ablations and real-world evaluations to support its applicability to real-time 3D streaming and storage scenarios.
Abstract
Point cloud streaming is increasingly getting popular, evolving into the norm for interactive service delivery and the future Metaverse. However, the substantial volume of data associated with point clouds presents numerous challenges, particularly in terms of high bandwidth consumption and large storage capacity. Despite various solutions proposed thus far, with a focus on point cloud compression, upsampling, and completion, these reconstruction-related methods continue to fall short in delivering high fidelity point cloud output. As a solution, in DiffPMAE, we propose an effective point cloud reconstruction architecture. Inspired by self-supervised learning concepts, we combine Masked Auto-Encoding and Diffusion Model mechanism to remotely reconstruct point cloud data. By the nature of this reconstruction process, DiffPMAE can be extended to many related downstream tasks including point cloud compression, upsampling and completion. Leveraging ShapeNet-55 and ModelNet datasets with over 60000 objects, we validate the performance of DiffPMAE exceeding many state-of-the-art methods in-terms of auto-encoding and downstream tasks considered.
