PointCubeNet: 3D Part-level Reasoning with 3x3x3 Point Cloud Blocks
Da-Yeong Kim, Yeong-Jun Cho
TL;DR
PointCubeNet addresses unsupervised 3D part-level reasoning by jointly learning global and local representations from raw point clouds and aligning them with text descriptions generated by a large language model. It avoids 3D-to-2D projections and pretrained image-language encoders, instead using a local 27-block branch with self- and cross-attention and InfoNCE-based contrastive losses to connect visual and textual modalities. The key contributions include first unsupervised 3D part-level reasoning over 27 local blocks, a soft local loss that handles symmetry, and zero-shot part-level reasoning demonstrated on ModelNet and ShapeNet. The results show improved object understanding when local parts are modeled and robust cross-domain performance without manual part annotations.
Abstract
In this paper, we propose PointCubeNet, a novel multi-modal 3D understanding framework that achieves part-level reasoning without requiring any part annotations. PointCubeNet comprises global and local branches. The proposed local branch, structured into 3x3x3 local blocks, enables part-level analysis of point cloud sub-regions with the corresponding local text labels. Leveraging the proposed pseudo-labeling method and local loss function, PointCubeNet is effectively trained in an unsupervised manner. The experimental results demonstrate that understanding 3D object parts enhances the understanding of the overall 3D object. In addition, this is the first attempt to perform unsupervised 3D part-level reasoning and achieves reliable and meaningful results.
