SE(3)-Hyena Operator for Scalable Equivariant Learning
Artem Moskalev, Mangal Prakash, Rui Liao, Tommaso Mansi
TL;DR
The paper addresses the challenge of modeling global geometric context under SE(3) equivariance with scalable computation. It introduces SE(3)-Hyena, an equivariant long-convolution operator that leverages FFT-based scalar and vector long convolutions, achieving sub-quadratic complexity $O(N \log_2 N)$ while preserving rotations and translations. The model uses two parallel streams (invariant scalars and equivariant vectors), an input projection via a Clifford MLP, gating, and an SE(3)-equivariant output projection, enabling efficient global context aggregation. Empirical results on equivariant associative recall and n-body dynamics show that SE(3)-Hyena matches or surpasses the equivariant transformer while drastically reducing memory and enabling much longer contexts (up to $3.5$M tokens on a single GPU), highlighting its practical potential for scalable, geometry-aware learning. The work also discusses limitations and directions for future work, such as extending vector convolutions to other dimensions and improving permutation-equivariance of FFT-based long convolutions.
Abstract
Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to sequence length, while localized methods sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, in this work, we introduce SE(3)-Hyena operator, an equivariant long-convolutional model based on the Hyena operator. The SE(3)-Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on equivariant associative recall and n-body modeling, SE(3)-Hyena matches or outperforms equivariant self-attention while requiring significantly less memory and computational resources for long sequences. Our model processes the geometric context of 20k tokens x3.5 times faster than the equivariant transformer and allows x175 longer a context within the same memory budget.
