Table of Contents
Fetching ...

Versor: A Geometric Sequence Architecture

Truong Minh Huy, Edward Hirst

TL;DR

Versor proposes a Conformal Geometric Algebra–based sequence architecture that embeds states on the Spin(4,1) manifold to enforce $SE(3)$-equivariance via rotor transformations. The core mechanisms, GPA and RRA, deliver interpretable proximity and orientation attention and linear-time sequence processing with manifold-normalized stability, achieving state-of-the-art or competitive results across chaotic dynamics, topology, and multimodal benchmarks with far fewer parameters than Euclidean baselines. The work demonstrates strong zero-shot generalization, robust distribution shift resilience, and substantial hardware-speedups from bit-masked Clifford kernels, signaling a potential shift toward geometrically aware AI for scientific modeling. It also outlines concrete future directions, including Lie-manifold optimization, Hamiltonian extensions, and dedicated geometric accelerators (GAPU) to further harness the benefits of Clifford-based architectures in real-world deployments.

Abstract

A novel sequence architecture design is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of the traditional fundamental non-linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the $Cl_{4,1}$ manifold and evolving them via geometric transformations (rotors), Versor natively represents $SE(3)$-equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN). Key results include: orders of magnitude fewer parameters ($200\times$ vs. Transformers); interpretable attention decomposing into proximity and orientational components; zero-shot scale generalization (99.3% MCC on topology vs. 50.4% for ViT); and $O(L)$ linear complexity via the novel Recursive Rotor Accumulator. In out-of-distribution tests, Versor maintains stable predictions while Transformers fail catastrophically. Custom Clifford kernels achieve up to $78\times$ speedup, providing a scalable foundation for geometrically-aware scientific modeling.

Versor: A Geometric Sequence Architecture

TL;DR

Versor proposes a Conformal Geometric Algebra–based sequence architecture that embeds states on the Spin(4,1) manifold to enforce -equivariance via rotor transformations. The core mechanisms, GPA and RRA, deliver interpretable proximity and orientation attention and linear-time sequence processing with manifold-normalized stability, achieving state-of-the-art or competitive results across chaotic dynamics, topology, and multimodal benchmarks with far fewer parameters than Euclidean baselines. The work demonstrates strong zero-shot generalization, robust distribution shift resilience, and substantial hardware-speedups from bit-masked Clifford kernels, signaling a potential shift toward geometrically aware AI for scientific modeling. It also outlines concrete future directions, including Lie-manifold optimization, Hamiltonian extensions, and dedicated geometric accelerators (GAPU) to further harness the benefits of Clifford-based architectures in real-world deployments.

Abstract

A novel sequence architecture design is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of the traditional fundamental non-linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the manifold and evolving them via geometric transformations (rotors), Versor natively represents -equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN). Key results include: orders of magnitude fewer parameters ( vs. Transformers); interpretable attention decomposing into proximity and orientational components; zero-shot scale generalization (99.3% MCC on topology vs. 50.4% for ViT); and linear complexity via the novel Recursive Rotor Accumulator. In out-of-distribution tests, Versor maintains stable predictions while Transformers fail catastrophically. Custom Clifford kernels achieve up to speedup, providing a scalable foundation for geometrically-aware scientific modeling.
Paper Structure (80 sections, 50 equations, 3 figures, 8 tables, 2 algorithms)

This paper contains 80 sections, 50 equations, 3 figures, 8 tables, 2 algorithms.

Figures (3)

  • Figure 1: The Versor Architecture. (Left) Geometric Product Attention (GPA). (Right) The Recursive Rotor Accumulator (RRA).
  • Figure 2: Geometric Attention Decomposition: Separating Force from Torque. Points labeled B0--B4 represent the 5 gravitationally-interacting bodies; B0 is the focal body for this visualization. The axes ($x_1$, $x_2$) are the 2D physical coordinates of the simulation. Line weights are proportional to attention strength. (Left) Scalar Attention (Proximity): The heatmap of the scalar component $\langle Q \tilde{K} \rangle_0$ recovers the distance-dependent interaction law, with brighter/thicker connections indicating stronger proximity-based attention. (Right) Bivector Attention (Torque): The magnitude of the bivector component $\|\langle Q \tilde{K} \rangle_2\|$ captures orientational coupling. Higher bivector attention (lighter lines) indicates interactions where relative angular momentum is significant for dynamics prediction. Note that bivector attention can be high for distant bodies if their relative orientation is dynamically important (e.g., B3), demonstrating that the model learns orientation-dependent physics beyond simple distance.
  • Figure 3: Computational Scaling: Latency (ms) vs. Sequence Length $L$. Versor maintains strictly linear $O(L)$ growth, whereas standard Transformers diverge quadratically, reaching memory exhaustion (OOM) at $L=1024$. The dotted reference line explicitly denotes a gradient of 1 (linear scaling) on the log-log plot to highlight this behavior.