Geometric Hyena Networks for Large-scale Equivariant Learning
Artem Moskalev, Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, Tommaso Mansi
TL;DR
Geometric Hyena introduces the first equivariant long-convolutional architecture for large-scale geometric graphs, enabling sub-quadratic global context while preserving SE(3) invariance. It combines FFT-based scalar long convolution with vector cross-product long convolution and an invariant-equivariant interaction to model rich geometric relations, aided by global context tokens and KV normalization. The approach delivers faster runtimes and lower memory usage than equivariant self-attention, while achieving state-of-the-art performance on large RNA property prediction, RNA switching factor, and protein MD, and introducing a geometric associative recall task for interpretability. This work demonstrates that global geometric context can be efficiently modeled at scale, opening avenues for robust predictions in biology, chemistry, and physics with large geometric sequences.
Abstract
Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equivariance and global context at scale. Standard methods such as equivariant self-attention suffer from quadratic complexity, while local methods such as distance-based message passing sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, we introduce Geometric Hyena, the first equivariant long-convolutional model for geometric systems. Geometric Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on all-atom property prediction of large RNA molecules and full protein molecular dynamics, Geometric Hyena outperforms existing equivariant models while requiring significantly less memory and compute that equivariant self-attention. Notably, our model processes the geometric context of 30k tokens 20x faster than the equivariant transformer and allows 72x longer context within the same budget.
