E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products
Yunyang Li, Lin Huang, Zhihao Ding, Chu Wang, Xinran Wei, Han Yang, Zun Wang, Chang Liu, Yu Shi, Peiran Jin, Tao Qin, Mark Gerstein, Jia Zhang
TL;DR
E2Former introduces a Wigner $6j$ convolution to transform edge-focused spherical tensor products into node-centric operations, reducing asymptotic complexity from $O(|\mathcal{E}|)$ to $O(|\mathcal{V}|)$ while preserving rotational equivariance. This enables an efficient and expressive transformer architecture for molecular modeling, achieving strong performance on OC20, OC22, and SPICE benchmarks and enabling large-scale molecular dynamics simulations much faster than DFT or traditional empirical methods. The work provides detailed theoretical foundations, rigorous scaling analysis, and extensive experiments, including MD on systems up to several thousand atoms, showcasing practical applicability to large-scale chemistry, biology, and materials science problems. Overall, E2Former represents a scalable, physics-informed approach to learnable force fields that can advance high-fidelity simulations in real-world molecular applications.
Abstract
Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner $6j$ convolution (Wigner $6j$ Conv). By shifting the computational burden from edges to nodes, the Wigner $6j$ Conv reduces the complexity from $O(|\mathcal{E}|)$ to $ O(| \mathcal{V}|)$ while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x-30x speedup compared to conventional $\mathrm{SO}(3)$ convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable and efficient molecular modeling.
