Table of Contents
Fetching ...

Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification

Naiwen Hu, Haozhe Cheng, Yifan Xie, Pengcheng Shi, Jihua Zhu

TL;DR

A hyperbolic image-and-pointcloud contrastive learning method that leverages images to guide the point cloud in establishing strong semantic hierarchical correlations and ablation studies and confirmatory testing validate the rationality of HyperIPC’s parameter settings and the effectiveness of its submodules.

Abstract

3D contrastive representation learning has exhibited remarkable efficacy across various downstream tasks. However, existing contrastive learning paradigms based on cosine similarity fail to deeply explore the potential intra-modal hierarchical and cross-modal semantic correlations about multi-modal data in Euclidean space. In response, we seek solutions in hyperbolic space and propose a hyperbolic image-and-pointcloud contrastive learning method (HyperIPC). For the intra-modal branch, we rely on the intrinsic geometric structure to explore the hyperbolic embedding representation of point cloud to capture invariant features. For the cross-modal branch, we leverage images to guide the point cloud in establishing strong semantic hierarchical correlations. Empirical experiments underscore the outstanding classification performance of HyperIPC. Notably, HyperIPC enhances object classification results by 2.8% and few-shot classification outcomes by 5.9% on ScanObjectNN compared to the baseline. Furthermore, ablation studies and confirmatory testing validate the rationality of HyperIPC's parameter settings and the effectiveness of its submodules.

Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification

TL;DR

A hyperbolic image-and-pointcloud contrastive learning method that leverages images to guide the point cloud in establishing strong semantic hierarchical correlations and ablation studies and confirmatory testing validate the rationality of HyperIPC’s parameter settings and the effectiveness of its submodules.

Abstract

3D contrastive representation learning has exhibited remarkable efficacy across various downstream tasks. However, existing contrastive learning paradigms based on cosine similarity fail to deeply explore the potential intra-modal hierarchical and cross-modal semantic correlations about multi-modal data in Euclidean space. In response, we seek solutions in hyperbolic space and propose a hyperbolic image-and-pointcloud contrastive learning method (HyperIPC). For the intra-modal branch, we rely on the intrinsic geometric structure to explore the hyperbolic embedding representation of point cloud to capture invariant features. For the cross-modal branch, we leverage images to guide the point cloud in establishing strong semantic hierarchical correlations. Empirical experiments underscore the outstanding classification performance of HyperIPC. Notably, HyperIPC enhances object classification results by 2.8% and few-shot classification outcomes by 5.9% on ScanObjectNN compared to the baseline. Furthermore, ablation studies and confirmatory testing validate the rationality of HyperIPC's parameter settings and the effectiveness of its submodules.
Paper Structure (15 sections, 10 equations, 5 figures, 4 tables)

This paper contains 15 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustration of semantic hierarchy in hyperbolic space. "Airplane" can be organized into a tree-like structure according to their flying modes and other semantic information, where the lower the level, the more detailed the object's description. The point cloud features located at different nodes of the same level should pass through the root node (red line) when calculating distance. However, the distance in Euclidean space is defined according to cosine similarity (yellow line).
  • Figure 2: Our proposed model architecture. Point cloud branch: Intra-Modal Hyperbolic Contrastive Learning (IMHCL) makes the modal learn the invariance between two augmented point cloud. Image branch: Cross-Modal Hyperbolic Contrastive Learning (CMHCL) leverages rendered image guide point cloud to establish a hierarchical structure.
  • Figure 3: Illustration of the hyperbolic embedding optimization. Before optimization (Left), the root node deviates from the center of hyperbolic space and the leaf nodes are far from the boundary of the Poincaré disk. After hyperbolic optimization (Right), the root node of the data is aligned with the origin of the hyperbolic space, and the leaf nodes make full use of the characteristics of the hyperbolic space to disperse as much as possible.
  • Figure 4: UMAPmcinnes2018umap embeddings for ModelNet10 (evaluation sets) on the Poincaré disk. Each point inside the Poincaré disk corresponds to a sample. Different colors indicate different classes. After IMHCL and CMHCL, the samples are clustered according to the labels, and each category is also closer to the boundary of the Poincaré disk.
  • Figure 5: Impact of joint learning objective. Classification accuracy of intra-modal and cross-modal and joint learning objectives on ScanObjectNN and ModelNet40.