Table of Contents
Fetching ...

SE(3)-Hyena Operator for Scalable Equivariant Learning

Artem Moskalev, Mangal Prakash, Rui Liao, Tommaso Mansi

TL;DR

The paper addresses the challenge of modeling global geometric context under SE(3) equivariance with scalable computation. It introduces SE(3)-Hyena, an equivariant long-convolution operator that leverages FFT-based scalar and vector long convolutions, achieving sub-quadratic complexity $O(N \log_2 N)$ while preserving rotations and translations. The model uses two parallel streams (invariant scalars and equivariant vectors), an input projection via a Clifford MLP, gating, and an SE(3)-equivariant output projection, enabling efficient global context aggregation. Empirical results on equivariant associative recall and n-body dynamics show that SE(3)-Hyena matches or surpasses the equivariant transformer while drastically reducing memory and enabling much longer contexts (up to $3.5$M tokens on a single GPU), highlighting its practical potential for scalable, geometry-aware learning. The work also discusses limitations and directions for future work, such as extending vector convolutions to other dimensions and improving permutation-equivariance of FFT-based long convolutions.

Abstract

Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to sequence length, while localized methods sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, in this work, we introduce SE(3)-Hyena operator, an equivariant long-convolutional model based on the Hyena operator. The SE(3)-Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on equivariant associative recall and n-body modeling, SE(3)-Hyena matches or outperforms equivariant self-attention while requiring significantly less memory and computational resources for long sequences. Our model processes the geometric context of 20k tokens x3.5 times faster than the equivariant transformer and allows x175 longer a context within the same memory budget.

SE(3)-Hyena Operator for Scalable Equivariant Learning

TL;DR

The paper addresses the challenge of modeling global geometric context under SE(3) equivariance with scalable computation. It introduces SE(3)-Hyena, an equivariant long-convolution operator that leverages FFT-based scalar and vector long convolutions, achieving sub-quadratic complexity while preserving rotations and translations. The model uses two parallel streams (invariant scalars and equivariant vectors), an input projection via a Clifford MLP, gating, and an SE(3)-equivariant output projection, enabling efficient global context aggregation. Empirical results on equivariant associative recall and n-body dynamics show that SE(3)-Hyena matches or surpasses the equivariant transformer while drastically reducing memory and enabling much longer contexts (up to M tokens on a single GPU), highlighting its practical potential for scalable, geometry-aware learning. The work also discusses limitations and directions for future work, such as extending vector convolutions to other dimensions and improving permutation-equivariance of FFT-based long convolutions.

Abstract

Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to sequence length, while localized methods sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, in this work, we introduce SE(3)-Hyena operator, an equivariant long-convolutional model based on the Hyena operator. The SE(3)-Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on equivariant associative recall and n-body modeling, SE(3)-Hyena matches or outperforms equivariant self-attention while requiring significantly less memory and computational resources for long sequences. Our model processes the geometric context of 20k tokens x3.5 times faster than the equivariant transformer and allows x175 longer a context within the same memory budget.
Paper Structure (34 sections, 5 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 5 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: SE(3)-Hyena operator marries global context, equivariance and scalability towards long sequences. SE(3)-Hyena operator can process global geometric context in sub-quadratic time while preserving equivariance to rotations and translations.
  • Figure 2: SE(3)-Hyena building blocks.(a) Schematic of existing Hyena architecture poli2023hyena. (b) The proposed architecture consists of the SE(3)-Hyena operator, residual connections, and an equivariant MLP. (c) The block architecture of SE(3)-Hyena operator consists of two streams processing invariant and equivariant features. The key components are scalar and vector long convolution responsible for global context aggregation.
  • Figure 3: Equivariant associative recall task. An equivariant associative recall requires retrieving a vector token for a given vector query based on the context. The retrieval mechanism requires equivariance to rotation of tokens in a sequence. As standard associative recall serves to test the capability of models to learn global context, the equivariant associative recall task serves to test capability of models to learn global context with equivariance.
  • Figure 4: Top row: The MSE between retrieved and target vectors for the fixed vocabulary associative recall task is plotted across various sequence lengths. Equivariant models effectively learn and generalize the underlying vocabulary across different orientations. Bottom row: The MSE for the random vocabulary associative recall task. The SE(3)-Hyena excels in learning the equivariant retrieval function, successfully associating target queries with their corresponding value vectors.
  • Figure 5: Top row: Forward runtime comparison. SE(3)-Hyena scales sub-quadratically and achieves a considerable speedup compared to SE(3)-Transformer when processing long sequences. Bottom row: Total GPU memory utilization for equivariant Hyena and transformer models.