Single-Core Superscalar Optimization of Clifford Neural Layers
X. Angelo Huang, Ruben Ciranni, Giovanni Spadaccini, Carla J. López Zurita
TL;DR
This work addresses accelerating Clifford neural layers that realize $E(n)$ and $O(n)$ equivariance on CPU by exploiting Clifford algebra structure to reduce memory traffic and computations. It moves from a PyTorch-based implementation to a high-performance C backend with inlining, loop optimizations, and AVX2 SIMD, while preserving numerical correctness. The authors report an average speedup of $21.35\times$ across eleven functions and competitive performance relative to PyTorch in many cases, alongside a robust testing and benchmarking setup. The results demonstrate the practical viability of high-performance Clifford networks on CPU for physics- and geometry-inspired applications.
Abstract
Within the growing interest in the physical sciences in developing networks with equivariance properties, Clifford neural layers shine as one approach that delivers $E(n)$ and $O(n)$ equivariances given specific group actions. In this paper, we analyze the inner structure of the computation within Clifford convolutional layers and propose and implement several optimizations to speed up the inference process while maintaining correctness. In particular, we begin by analyzing the theoretical foundations of Clifford algebras to eliminate redundant matrix allocations and computations, then systematically apply established optimization techniques to enhance performance further. We report a final average speedup of 21.35x over the baseline implementation of eleven functions and runtimes comparable to and faster than the original PyTorch implementation in six cases. In the remaining cases, we achieve performance in the same order of magnitude as the original library.
