SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
Fabian B. Fuchs, Daniel E. Worrall, Volker Fischer, Max Welling
TL;DR
The paper addresses learning on 3D geometric data by enforcing SE(3) roto-translation equivariance in an attention-based framework. It proposes SE(3)-Transformer, combining neighborhood-based graph constructs with SE(3)-invariant attention weights and SE(3)-equivariant value messages, plus an attentive self-interaction pathway. The approach achieves robustness to rotations, scalable processing of large point clouds, and competitive performance on N-body, ScanObjectNN, and QM9 compared to non-equivariant and non-attentive baselines. The work demonstrates improved stability and accuracy, with practical benefits for molecular property prediction and robotics, and provides a fast spherical harmonics implementation and public code.
Abstract
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model. The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds and graphs with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.
