Table of Contents
Fetching ...

Composing Linear Layers from Irreducibles

Travis Pence, Daisuke Yamada, Vikas Singh

TL;DR

This work shows that linear layers can be represented as compositions of bivectors in Clifford algebra, enabling rotor-based linear transformations that act on multivectors and admit a parameter count of $\mathcal{O}(\log^2 d)$, substantially fewer than dense layers. By decomposing bivectors into commuting simple components and using a differentiable invariant decomposition, the authors provide a closed-form, differentiable rotor construction suitable for autograd. Empirically, rotor-based projections for key, query, and value in LLM attention achieve competitive accuracy and perplexity against Low-Rank and Block-Hadamard baselines while dramatically reducing parameter counts, and an end-to-end FMNIST experiment demonstrates feasibility for joint training. The work illuminates a principled algebraic path to understanding and constructing compact, interpretable primitives that compose into higher-level functions in neural networks, with practical potential contingent on hardware-aware optimizations. Overall, the rotor framework offers a promising step toward parameter-efficient, geometrically structured neural architectures and motivates future system-level integrations.

Abstract

Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

Composing Linear Layers from Irreducibles

TL;DR

This work shows that linear layers can be represented as compositions of bivectors in Clifford algebra, enabling rotor-based linear transformations that act on multivectors and admit a parameter count of , substantially fewer than dense layers. By decomposing bivectors into commuting simple components and using a differentiable invariant decomposition, the authors provide a closed-form, differentiable rotor construction suitable for autograd. Empirically, rotor-based projections for key, query, and value in LLM attention achieve competitive accuracy and perplexity against Low-Rank and Block-Hadamard baselines while dramatically reducing parameter counts, and an end-to-end FMNIST experiment demonstrates feasibility for joint training. The work illuminates a principled algebraic path to understanding and constructing compact, interpretable primitives that compose into higher-level functions in neural networks, with practical potential contingent on hardware-aware optimizations. Overall, the rotor framework offers a promising step toward parameter-efficient, geometrically structured neural architectures and motivates future system-level integrations.

Abstract

Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

Paper Structure

This paper contains 47 sections, 8 theorems, 52 equations, 7 figures, 11 tables, 2 algorithms.

Key Result

lemma 1

(hestenes2012clifford) Let $a_t$ and $b_t$ denote multivectors in $\Cl(n)$. Any linear function $F$ from $\Cl^k(n)$ to $\Cl(n)$ can be written as the finite sum for some width $w < \infty$,

Figures (7)

  • Figure 1: The basis vectors, bivectors, and trivector for $\Cl(3)$
  • Figure 2: The sandwich product rotating a vector $60^\circ$ in the $e_1\wedge e_2$ plane.
  • Figure 3: The [bivector $\rightarrow$ invariant decomposition $\rightarrow$ rotor decomposition $\rightarrow$ rotor] process that enables exact parametrization. Note that a pure rotor is one that corresponds to a simple bivector.
  • Figure 4: Rotor architecture with $c_1 = 3$ and $c_2 = 2$. An input $x$ is split into $\left\{x^{I_i}\right\}_{i \in [c_1]}$, each mapped to $y^{O_j}$ via rotor maps $\psi_{r_{ij}, s_{ij}}$, for each $j \in [c_2]$. The outputs $\left\{y^{O_j}\right\}$ are pooled and assembled into the final output $y$.
  • Figure 5: Effect of rotor width and depth. Replacing Layer-13 in Qwen-2.5 1.5B with rotors of varying depth and width. The dashed line (9.845) indicates convergence to the base model’s perplexity.
  • ...and 2 more figures

Theorems & Definitions (21)

  • lemma 1
  • Remark
  • definition 1
  • Example 3.1
  • lemma 2: Thm. 4.8 in eelbode2024outereigentangentconcepts
  • theorem 1: $\psi$ Parameter Count
  • theorem 2
  • proof
  • theorem 3
  • proof
  • ...and 11 more