Table of Contents
Fetching ...

Masked BRep Autoencoder via Hierarchical Graph Transformer

Yifei Li, Kang Wu, Wenming Wu, Xiao-Ming Fu

Abstract

We introduce a novel self-supervised learning framework that automatically learns representations from input computer-aided design (CAD) models for downstream tasks, including part classification, modeling segmentation, and machining feature recognition. To train our network, we construct a large-scale, unlabeled dataset of boundary representation (BRep) models. The success of our algorithm relies on two keycomponents. The first is a masked graph autoencoder that reconstructs randomly masked geometries and attributes of BReps for representation learning to enhance the generalization. The second is a hierarchical graph Transformer architecture that elegantly fuses global and local learning by a cross-scale mutual attention block to model long-range geometric dependencies and a graph neural network block to aggregate local topological information. After training the autoencoder, we replace its decoder with a task-specific network trained on a small amount of labeled data for downstream tasks. We conduct experiments on various tasks and achieve high performance, even with a small amount of labeled data, demonstrating the practicality and generalizability of our model. Compared to other methods, our model performs significantly better on downstream tasks with the same amount of training data, particularly when the training data is very limited.

Masked BRep Autoencoder via Hierarchical Graph Transformer

Abstract

We introduce a novel self-supervised learning framework that automatically learns representations from input computer-aided design (CAD) models for downstream tasks, including part classification, modeling segmentation, and machining feature recognition. To train our network, we construct a large-scale, unlabeled dataset of boundary representation (BRep) models. The success of our algorithm relies on two keycomponents. The first is a masked graph autoencoder that reconstructs randomly masked geometries and attributes of BReps for representation learning to enhance the generalization. The second is a hierarchical graph Transformer architecture that elegantly fuses global and local learning by a cross-scale mutual attention block to model long-range geometric dependencies and a graph neural network block to aggregate local topological information. After training the autoencoder, we replace its decoder with a task-specific network trained on a small amount of labeled data for downstream tasks. We conduct experiments on various tasks and achieve high performance, even with a small amount of labeled data, demonstrating the practicality and generalizability of our model. Compared to other methods, our model performs significantly better on downstream tasks with the same amount of training data, particularly when the training data is very limited.
Paper Structure (55 sections, 19 equations, 11 figures, 17 tables)

This paper contains 55 sections, 19 equations, 11 figures, 17 tables.

Figures (11)

  • Figure 1: Given a CAD model, our method first applies a BRep encoder to construct a gAAG from the extracted BRep information. A Graph encoder is then implemented to update the graph features with both global and local information. During pre-training, we train an MAE by reconstructing the randomly masked faces and edges for BRep representation learning (the upper part of the figure). In the fine-tuning stage, we train a new network, formed by connecting a task-specific head behind the encoder, using a small amount of labeled data with different loss functions (lower part of the figure).
  • Figure 2: Our BRep encoder extracts and fuses geometric and attribute features into a unified gAAG representation via parallel CNN and MLP branches.
  • Figure 3: Left: The overall architecture of the Graph encoder, processing multi-resolution face features $F_\text{low}$, $F_\text{high}$, and edge features $E$. Right: The internal structure of the $k$-th CSMA block. CA and SA denote cross-attention and self-attention layers, respectively, while Q, K, and V represent queries, keys, and values.
  • Figure 4: Four reconstruction examples. For each example, the sequence from left to right shows: (1) the input BRep model, (2) the full surface point cloud, (3) the masked input where black denotes masked faces and blue denotes unmasked faces, and (4) the reconstructed point cloud.
  • Figure 5: Machining feature recognition under the full supervised setting. We accurately predict various machining features, such as passages, pockets, slots, and steps.
  • ...and 6 more figures