Table of Contents
Fetching ...

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

Yi-Lun Liao, Tess Smidt

TL;DR

Equiformer advances 3D-atomistic graph modeling by integrating SE(3)/E(3)-equivariant irreps with Transformer-style attention. Its core innovations are equivariant operations (including linear, LN, gating, and depth-wise tensor products) and an equivariant graph attention that uses MLP-based attention and non-linear message passing to fuse content and geometry. Empirical results across QM9, MD17, and OC20 demonstrate competitive or superior performance to state-of-the-art invariant and equivariant models, often with improved training efficiency. The work broadens the applicability of Transformers to 3D molecular graphs while maintaining strong symmetry-aware inductive biases.

Abstract

Despite their widespread success in various domains, Transformer networks have yet to perform well across datasets in the domain of 3D atomistic graphs such as molecules even when 3D-related inductive biases like translational invariance and rotational equivariance are considered. In this paper, we demonstrate that Transformers can generalize well to 3D atomistic graphs and present Equiformer, a graph neural network leveraging the strength of Transformer architectures and incorporating SE(3)/E(3)-equivariant features based on irreducible representations (irreps). First, we propose a simple and effective architecture by only replacing original operations in Transformers with their equivariant counterparts and including tensor products. Using equivariant operations enables encoding equivariant information in channels of irreps features without complicating graph structures. With minimal modifications to Transformers, this architecture has already achieved strong empirical results. Second, we propose a novel attention mechanism called equivariant graph attention, which improves upon typical attention in Transformers through replacing dot product attention with multi-layer perceptron attention and including non-linear message passing. With these two innovations, Equiformer achieves competitive results to previous models on QM9, MD17 and OC20 datasets.

Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

TL;DR

Equiformer advances 3D-atomistic graph modeling by integrating SE(3)/E(3)-equivariant irreps with Transformer-style attention. Its core innovations are equivariant operations (including linear, LN, gating, and depth-wise tensor products) and an equivariant graph attention that uses MLP-based attention and non-linear message passing to fuse content and geometry. Empirical results across QM9, MD17, and OC20 demonstrate competitive or superior performance to state-of-the-art invariant and equivariant models, often with improved training efficiency. The work broadens the applicability of Transformers to 3D molecular graphs while maintaining strong symmetry-aware inductive biases.

Abstract

Despite their widespread success in various domains, Transformer networks have yet to perform well across datasets in the domain of 3D atomistic graphs such as molecules even when 3D-related inductive biases like translational invariance and rotational equivariance are considered. In this paper, we demonstrate that Transformers can generalize well to 3D atomistic graphs and present Equiformer, a graph neural network leveraging the strength of Transformer architectures and incorporating SE(3)/E(3)-equivariant features based on irreducible representations (irreps). First, we propose a simple and effective architecture by only replacing original operations in Transformers with their equivariant counterparts and including tensor products. Using equivariant operations enables encoding equivariant information in channels of irreps features without complicating graph structures. With minimal modifications to Transformers, this architecture has already achieved strong empirical results. Second, we propose a novel attention mechanism called equivariant graph attention, which improves upon typical attention in Transformers through replacing dot product attention with multi-layer perceptron attention and including non-linear message passing. With these two innovations, Equiformer achieves competitive results to previous models on QM9, MD17 and OC20 datasets.
Paper Structure (93 sections, 9 equations, 5 figures, 17 tables)

This paper contains 93 sections, 9 equations, 5 figures, 17 tables.

Figures (5)

  • Figure 1: Architecture of Equiformer. We embed input 3D graphs with atom and edge-degree embeddings and process them with Transformer blocks, consisting of equivariant graph attention and feed forward networks. In this figure, "$\otimes$" denotes multiplication, "$\oplus$" denotes addition, and "DTP" stands for depth-wise tensor product. $\sum$ within a circle denotes summation over all neighbors. Gray cells indicate intermediate irreps features.
  • Figure 2: Equivariant operations used in Equiformer.(a) Each gray line between input and output irreps features contains one learnable weight. (b) "RMS" denotes the root mean square value along the channel dimension. For simplicity, we have removed multiplying by $\gamma$ here. (c) Gate layers are equivariant activation functions where non-linearly transformed scalars are used to gate non-scalar irreps features. (d) The left two irreps features correspond to two input irreps features, and the rightmost one is the output irreps feature. The two gray lines connecting two vectors in the input irreps features and one vector in the output irreps feature form a path and contain one learnable weight. An alternative visualization of depth-wise tensor products can be found in Fig. \ref{['fig:depthwise_tensor_product_e3nn']} in appendix. We show $SE(3)$-equivariant operations here, which can be generalized to $E(3)$-equivariant features.
  • Figure 3: An alternative visualization of the depth-wise tensor product. We follow the visualization of tensor products in e3nne3nn and separate paths into three parts based on the types of output vectors. We note that one vector in the output irreps feature depends only on one vector in each input irreps feature.
  • Figure 4: Architecture of equivariant dot product attention without non-linear message passing. In this figure, "$\otimes$" denotes multiplication, "$\oplus$" denotes addition, and "DTP" stands for depth-wise tensor product. $\sum$ within a circle denotes summation over all neighbors. Gray cells indicate intermediate irreps features. We highlight the difference of dot product attention from multi-layer perceptron attention in red. Note that key $k_{ij}$ and value $v_{ij}$ are irreps features and therefore $f_{ij}$ in dot product attention typically has more channels than that in multi-layer perceptron attention.
  • Figure 5: Error distributions of different Equiformer models on different sub-splits of OC20 IS2RE validation set.