Table of Contents
Fetching ...

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

Yunyang Li, Lin Huang, Zhihao Ding, Chu Wang, Xinran Wei, Han Yang, Zun Wang, Chang Liu, Yu Shi, Peiran Jin, Tao Qin, Mark Gerstein, Jia Zhang

TL;DR

E2Former introduces a Wigner $6j$ convolution to transform edge-focused spherical tensor products into node-centric operations, reducing asymptotic complexity from $O(|\mathcal{E}|)$ to $O(|\mathcal{V}|)$ while preserving rotational equivariance. This enables an efficient and expressive transformer architecture for molecular modeling, achieving strong performance on OC20, OC22, and SPICE benchmarks and enabling large-scale molecular dynamics simulations much faster than DFT or traditional empirical methods. The work provides detailed theoretical foundations, rigorous scaling analysis, and extensive experiments, including MD on systems up to several thousand atoms, showcasing practical applicability to large-scale chemistry, biology, and materials science problems. Overall, E2Former represents a scalable, physics-informed approach to learnable force fields that can advance high-fidelity simulations in real-world molecular applications.

Abstract

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner $6j$ convolution (Wigner $6j$ Conv). By shifting the computational burden from edges to nodes, the Wigner $6j$ Conv reduces the complexity from $O(|\mathcal{E}|)$ to $ O(| \mathcal{V}|)$ while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x-30x speedup compared to conventional $\mathrm{SO}(3)$ convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable and efficient molecular modeling.

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

TL;DR

E2Former introduces a Wigner convolution to transform edge-focused spherical tensor products into node-centric operations, reducing asymptotic complexity from to while preserving rotational equivariance. This enables an efficient and expressive transformer architecture for molecular modeling, achieving strong performance on OC20, OC22, and SPICE benchmarks and enabling large-scale molecular dynamics simulations much faster than DFT or traditional empirical methods. The work provides detailed theoretical foundations, rigorous scaling analysis, and extensive experiments, including MD on systems up to several thousand atoms, showcasing practical applicability to large-scale chemistry, biology, and materials science problems. Overall, E2Former represents a scalable, physics-informed approach to learnable force fields that can advance high-fidelity simulations in real-world molecular applications.

Abstract

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner convolution (Wigner Conv). By shifting the computational burden from edges to nodes, the Wigner Conv reduces the complexity from to while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x-30x speedup compared to conventional convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable and efficient molecular modeling.

Paper Structure

This paper contains 47 sections, 9 theorems, 64 equations, 6 figures, 8 tables, 1 algorithm.

Key Result

Theorem 3.2

Let $\ell=u \geq 1$. Every $\ell=u$ spherical harmonic $\mathcal{R}^{(l)}(\mathbf{r}_{ij})$ can be expressed as an irreducible subspace of the $u$-fold tensor product $(\mathcal{R}^{(1)}(\mathbf{r}_{ij}))^{\otimes u}$. When expanded in terms of node-local terms, this satisfies:

Figures (6)

  • Figure 1: (a) Overview of the Proposed Approach. Rather than performing tensor products over edges by combining node features and distances, E2Former leverages two key concepts: binomial local expansion and Wigner $6j$ recoupling. The former represents edge directions in terms of node positions, while the latter reorders the sequence of tensor product operations. Together, the computational complexity of the tensor product is reduced from $O(\lvert \mathcal{E} \rvert)$ to $O(\lvert \mathcal{V} \rvert)$. $\otimes$ denotes the Clebsch-Gorden tensor product, and $\otimes^{\mathrm{6}j}$ denotes the CG tensor product where each path is parameterized by a weight governed by the Wigner-$6j$ coefficients. (b) Illustration of two equivalent ways to couple the tensor product of three representations: sequentially coupling two tensors before the third (left) or reordering the coupling sequence (right), with equivalence established via the Wigner $6j$ recoupling.
  • Figure 2: (a) Breaking down the runtime of attention-based $\mathrm{SO}(3)$ convolutions shows that message construction is the slowest step. Calculating attention and combining messages take much less time. (b) We compared the runtime of our Wigner $6j$ convolution (purple squares) against the standard $\mathrm{SO}(3)$ convolution (blue circles). Our method was consistently faster across different graph sizes ($N$), maximum angular momenta ($L_{\mathrm{max}}$), and sparsity levels (dense vs. sparse, see subplots b.i-iii). Full experimental details are in Sec. \ref{['sec:scale-analysis']}. (c) Runtime on 1000-node graphs as a function of angular momentum cutoff $L$ (up to $L_{\mathrm{max}}=6$). (d) Runtime on 1000-node graphs with fixed $L_{\mathrm{max}}=3$, varying the maximum number of neighbors from 64 to 512. In (b–d), both methods yield identical outputs.
  • Figure 3: (a) Power spectra comparison across computational methods: E2Former (blue), GFN2-xTB (orange), and MADFT (green). The graph corresponds to a simulation at NVT ensemble, temperature $T = 300 \, \text{K}$, with a time step of 1 fs. A structural overlay of the simulated system is displayed for context. (b) Efficiency comparison showing computational time for E2Former, GFN2-xtb, and MADFT. E2Former demonstrates the lowest computational time. The y-axis denotes the computation time for a single frame.
  • Figure 4: Power spectra comparison across computational methods: E2Former (blue), MACE (orange), and MADFT (green).
  • Figure 5: Overview of the E2Former architecture. (a) The main network alternates E2Attention blocks with feedforward layers, repeatedly refining node embeddings from a 3D molecular graph. (b) Within each E2Attention block, scalarized queries/keys (via ir2scalar) are combined with distance‐dependent features (RBF) and convolutions (6j-TP), updating the node embeddings equivariantly. (c) The final readout incorporates atomic types and radial/spherical expansions (RBF, SH) into a gated projection that produces the per‐atom output $y_i$.
  • ...and 1 more figures

Theorems & Definitions (24)

  • Definition 2.1: Solid Spherical Harmonics in Real Basis
  • Definition 2.2: Tensor Products of Irreps
  • Definition 2.3: Wigner $6j$ Symbol
  • Definition 3.1: $\mathrm{SO}(3)$-Equivariant Node Convolution
  • Theorem 3.2: Bionomial Local Expansion
  • proof : Proof Sketch
  • Theorem 3.3: Node-Based Factorization via Wigner $6j$
  • proof : Proof Sketch
  • Lemma 3.4: Equivariance of Wigner 6j Convolution
  • Lemma 3.5: Time Complexity of Wigner 6j Convolution
  • ...and 14 more