Table of Contents
Fetching ...

Random Walks in Self-supervised Learning for Triangular Meshes

Gal Yefet, Ayellet Tal

TL;DR

The paper tackles self-supervised learning for 3D triangular meshes, addressing irregularity and label scarcity. It uses random-walk augmentations to capture local geometry, learning embeddings through a joint NT-Xent contrastive loss and a KMeans clustering loss. A Walks-to-Features network processes walk sequences, aided by a projection head and selective walk averaging during inference. On SHREC11, the method achieves competitive retrieval and classification close to supervised baselines, while ModelNet40 shows a larger gap due to intra-class variability; collectively, the approach demonstrates a promising label-free path for 3D mesh analysis and downstream tasks.

Abstract

This study addresses the challenge of self-supervised learning for 3D mesh analysis. It presents an new approach that uses random walks as a form of data augmentation to generate diverse representations of mesh surfaces. Furthermore, it employs a combination of contrastive and clustering losses. The contrastive learning framework maximizes similarity between augmented instances of the same mesh while minimizing similarity between different meshes. We integrate this with a clustering loss, enhancing class distinction across training epochs and mitigating training variance. Our model's effectiveness is evaluated using mean Average Precision (mAP) scores and a supervised SVM linear classifier on extracted features, demonstrating its potential for various downstream tasks such as object classification and shape retrieval.

Random Walks in Self-supervised Learning for Triangular Meshes

TL;DR

The paper tackles self-supervised learning for 3D triangular meshes, addressing irregularity and label scarcity. It uses random-walk augmentations to capture local geometry, learning embeddings through a joint NT-Xent contrastive loss and a KMeans clustering loss. A Walks-to-Features network processes walk sequences, aided by a projection head and selective walk averaging during inference. On SHREC11, the method achieves competitive retrieval and classification close to supervised baselines, while ModelNet40 shows a larger gap due to intra-class variability; collectively, the approach demonstrates a promising label-free path for 3D mesh analysis and downstream tasks.

Abstract

This study addresses the challenge of self-supervised learning for 3D mesh analysis. It presents an new approach that uses random walks as a form of data augmentation to generate diverse representations of mesh surfaces. Furthermore, it employs a combination of contrastive and clustering losses. The contrastive learning framework maximizes similarity between augmented instances of the same mesh while minimizing similarity between different meshes. We integrate this with a clustering loss, enhancing class distinction across training epochs and mitigating training variance. Our model's effectiveness is evaluated using mean Average Precision (mAP) scores and a supervised SVM linear classifier on extracted features, demonstrating its potential for various downstream tasks such as object classification and shape retrieval.

Paper Structure

This paper contains 15 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Model Retrieval Results for the SHREC11 Dataset veltkamp2011shrec This figure demonstrates the retrieval performance of our model on three query examples from the SHREC11 dataset: an octopus, a snake, and a hand. The results showcase the model's ability to successfully retrieve similar items in the same class, even when the objects are presented in various poses or orientations veltkamp2011shrec.
  • Figure 2: Architecture. The model consists of four components: The first component is responsible for generating the walks from the 3D mesh models. The second component aggregates the information along the walk and yields the features. The third component is the projection head, which applies non-linear fully connected layers to the features in order to map input data into a lower-dimensional space, aiding in learning meaningful representations. The last component comprises the loss functions (a combination of the NT-Xnet loss and a Clustering loss). The outcome features that will be used for a downstream task are the features before the projection head.
  • Figure 3: Walk To Features architecture lahav2020meshwalker. This NN is initiated with a batch of random walks (the yellow walk on the gorilla). It consists of $3$ components: (1) The FC layers change the feature space; (2) The RNN layers aggregate the information along the walk; (3) The FC layer gathers the features.
  • Figure 4: Object Retrieval. Given a query (on the left), our model retrieves objects from the same class as the query object, even when the pose (for the bird) or head orientation (for the ant) varies.
  • Figure 5: t-SNE of SHREC11 models features. This visualization shows the clustering of feature vectors extracted from 10 classes of the SHREC11 dataset. The plot demonstrates that models from the same class are grouped together. Notably, clusters representing similar classes, such as different dog breeds (e.g., 'dog1' and 'dog2'), are positioned close to each other in the feature space.
  • ...and 1 more figures