Table of Contents
Fetching ...

Geometric Hyena Networks for Large-scale Equivariant Learning

Artem Moskalev, Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, Tommaso Mansi

TL;DR

Geometric Hyena introduces the first equivariant long-convolutional architecture for large-scale geometric graphs, enabling sub-quadratic global context while preserving SE(3) invariance. It combines FFT-based scalar long convolution with vector cross-product long convolution and an invariant-equivariant interaction to model rich geometric relations, aided by global context tokens and KV normalization. The approach delivers faster runtimes and lower memory usage than equivariant self-attention, while achieving state-of-the-art performance on large RNA property prediction, RNA switching factor, and protein MD, and introducing a geometric associative recall task for interpretability. This work demonstrates that global geometric context can be efficiently modeled at scale, opening avenues for robust predictions in biology, chemistry, and physics with large geometric sequences.

Abstract

Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equivariance and global context at scale. Standard methods such as equivariant self-attention suffer from quadratic complexity, while local methods such as distance-based message passing sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, we introduce Geometric Hyena, the first equivariant long-convolutional model for geometric systems. Geometric Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on all-atom property prediction of large RNA molecules and full protein molecular dynamics, Geometric Hyena outperforms existing equivariant models while requiring significantly less memory and compute that equivariant self-attention. Notably, our model processes the geometric context of 30k tokens 20x faster than the equivariant transformer and allows 72x longer context within the same budget.

Geometric Hyena Networks for Large-scale Equivariant Learning

TL;DR

Geometric Hyena introduces the first equivariant long-convolutional architecture for large-scale geometric graphs, enabling sub-quadratic global context while preserving SE(3) invariance. It combines FFT-based scalar long convolution with vector cross-product long convolution and an invariant-equivariant interaction to model rich geometric relations, aided by global context tokens and KV normalization. The approach delivers faster runtimes and lower memory usage than equivariant self-attention, while achieving state-of-the-art performance on large RNA property prediction, RNA switching factor, and protein MD, and introducing a geometric associative recall task for interpretability. This work demonstrates that global geometric context can be efficiently modeled at scale, opening avenues for robust predictions in biology, chemistry, and physics with large geometric sequences.

Abstract

Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equivariance and global context at scale. Standard methods such as equivariant self-attention suffer from quadratic complexity, while local methods such as distance-based message passing sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, we introduce Geometric Hyena, the first equivariant long-convolutional model for geometric systems. Geometric Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on all-atom property prediction of large RNA molecules and full protein molecular dynamics, Geometric Hyena outperforms existing equivariant models while requiring significantly less memory and compute that equivariant self-attention. Notably, our model processes the geometric context of 30k tokens 20x faster than the equivariant transformer and allows 72x longer context within the same budget.

Paper Structure

This paper contains 65 sections, 14 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Left: GPU forward runtime comparison. Geometric Hyena scales sub-quadratically and achieves a considerable speedup compared to other equivariant models with global context. Right: Peak GPU memory consumption for G-Hyena is the most efficient for long sequences.
  • Figure 2: Geometric Hyena block.(a) Geometric Hyena block includes the SE(3)-Hyena operator and equivariant projections. (b) The SE(3)-Hyena operator includes query, key, value projection, geometric long convolution for global context aggregation, and gating.
  • Figure 3: Top: The MSE ($\downarrow$) between retrieved and target vectors for the geometric associative recall task over various sequence lengths. Bottom: The study of geometric associative recall performance of different models across varying hidden dimensions and vocabulary size.
  • Figure 4: Geometric associative recall task. A geometric sequence consists of key and value vector tokens, where consecutive key-value pairs form bigrams. Geometric associative recall requires retrieving the value vector corresponding to a query, where the query matches one of the keys in the sequence.
  • Figure 5: Scalar-vector interactions in geometric long convolution. Blue lines represent interactions leading to a scalar output $\alpha_3$, and red lines are interactions leading to a vector output $\mathbf{r}_{3}$.