Table of Contents
Fetching ...

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

Fabian B. Fuchs, Daniel E. Worrall, Volker Fischer, Max Welling

TL;DR

The paper addresses learning on 3D geometric data by enforcing SE(3) roto-translation equivariance in an attention-based framework. It proposes SE(3)-Transformer, combining neighborhood-based graph constructs with SE(3)-invariant attention weights and SE(3)-equivariant value messages, plus an attentive self-interaction pathway. The approach achieves robustness to rotations, scalable processing of large point clouds, and competitive performance on N-body, ScanObjectNN, and QM9 compared to non-equivariant and non-attentive baselines. The work demonstrates improved stability and accuracy, with practical benefits for molecular property prediction and robotics, and provides a fast spherical harmonics implementation and public code.

Abstract

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model. The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds and graphs with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

TL;DR

The paper addresses learning on 3D geometric data by enforcing SE(3) roto-translation equivariance in an attention-based framework. It proposes SE(3)-Transformer, combining neighborhood-based graph constructs with SE(3)-invariant attention weights and SE(3)-equivariant value messages, plus an attentive self-interaction pathway. The approach achieves robustness to rotations, scalable processing of large point clouds, and competitive performance on N-body, ScanObjectNN, and QM9 compared to non-equivariant and non-attentive baselines. The work demonstrates improved stability and accuracy, with practical benefits for molecular property prediction and robotics, and provides a fast spherical harmonics implementation and public code.

Abstract

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model. The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds and graphs with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.

Paper Structure

This paper contains 39 sections, 43 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A) Each layer of the SE(3)-Transformer maps from a point cloud to a point cloud (or graph to graph) while guaranteeing equivariance. For classification, this is followed by an invariant pooling layer and an MLP. B) In each layer, for each node, attention is performed. Here, the red node attends to its neighbours. Attention weights (indicated by line thickness) are invariant w.r.t. input rotation.
  • Figure 2: Updating the node features using our equivariant attention mechanism in four steps. A more detailed description, especially of step 2, is provided in the Appendix. Steps 3 and 4 visualise a graph network perspective: features are passed from nodes to edges to compute keys, queries and values, which depend both on features and relative positions in a rotation-equivariant manner.
  • Figure 3: A model based on conventional self-attention (left) and our rotation-equivariant version (right) predict future locations and velocities in a 5-body problem. The respective left-hand plots show input locations at time step $t=0$, ground truth locations at $t=500$, and the respective predictions. The right-hand plots show predicted locations and velocities for rotations of the input in steps of 10 degrees. The dashed curves show the predicted locations of a perfectly equivariant model.
  • Figure 4: ScanObjectNN: $x$-axis shows data augmentation on the test set. The $x$-value corresponds to the maximum rotation around a random axis in the $x$-$y$-plane. If both training and test set are not rotated ($x=0$ in a), breaking the symmetry of the SE(3)-Transformer by providing the $z$-component of the coordinates as an additional, scalar input improves the performance significantly. Interestingly, the model learns to ignore the additional, symmetry-breaking input when the training set presents a rotation-invariant problem (strongly overlapping dark red circles and dark purple triangles in b).
  • Figure 5: Spherical harmonics computation of our own implementation compared to the lie-learn library. We found that speeding up the computation of spherical harmonics is critical to scale up both Tensor Field Networks ThomasSKYKR18 and SE(3)-Transformers to solve real-world machine learning tasks.
  • ...and 2 more figures