Table of Contents
Fetching ...

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Taha Bouhsine

Abstract

We introduce the yat-product, a kernel operator combining quadratic alignment with inverse-square proximity. We prove it is a Mercer kernel, analytic, Lipschitz on bounded domains, and self-regularizing, admitting a unique RKHS embedding. Neural Matter Networks (NMNs) use yat-product as the sole non-linearity, replacing conventional linear-activation-normalization blocks with a single geometrically-grounded operation. This architectural simplification preserves universal approximation while shifting normalization into the kernel itself via the denominator, rather than relying on separate normalization layers. Empirically, NMN-based classifiers match linear baselines on MNIST while exhibiting bounded prototype evolution and superposition robustness. In language modeling, Aether-GPT2 achieves lower validation loss than GPT-2 with a comparable parameter budget while using yat-based attention and MLP blocks. Our framework unifies kernel learning, gradient stability, and information geometry, establishing NMNs as a principled alternative to conventional neural architectures.

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Abstract

We introduce the yat-product, a kernel operator combining quadratic alignment with inverse-square proximity. We prove it is a Mercer kernel, analytic, Lipschitz on bounded domains, and self-regularizing, admitting a unique RKHS embedding. Neural Matter Networks (NMNs) use yat-product as the sole non-linearity, replacing conventional linear-activation-normalization blocks with a single geometrically-grounded operation. This architectural simplification preserves universal approximation while shifting normalization into the kernel itself via the denominator, rather than relying on separate normalization layers. Empirically, NMN-based classifiers match linear baselines on MNIST while exhibiting bounded prototype evolution and superposition robustness. In language modeling, Aether-GPT2 achieves lower validation loss than GPT-2 with a comparable parameter budget while using yat-based attention and MLP blocks. Our framework unifies kernel learning, gradient stability, and information geometry, establishing NMNs as a principled alternative to conventional neural architectures.
Paper Structure (87 sections, 40 theorems, 75 equations, 10 figures, 7 tables)

This paper contains 87 sections, 40 theorems, 75 equations, 10 figures, 7 tables.

Key Result

Theorem 1

Let $\varepsilon>0$ and define Then for every compact set $K\subset\mathbb R^d$, the kernel $k_{\text{\normalfont\tifinaghfont ⵟ}}$ is symmetric, continuous, and positive definite on $K$. Consequently, $k_{\text{\normalfont\tifinaghfont ⵟ}}$ is a Mercer kernel on $K$.

Figures (10)

  • Figure 1: Comparison of the gradient field and vector field for Dot Product, Euclidean Distance, ⵟ-product, and Cosine Similarity (from left to right). The heatmaps illustrate how the ⵟ-product, unlike traditional similarity measures, creates a potential well around the weight vector $\mathbf{w}$, reflecting both alignment and proximity.
  • Figure 2: Potential well induced by a single ⵟ-neuron. High response occurs near $\mathbf{w}$ when inputs are both aligned and close, in contrast to unbounded linear hyperplanes.
  • Figure 3: (a) ⵟ-product response as a function of angle $\theta$ between $\mathbf{w}$ and $\mathbf{x}$, demonstrating orthogonality sensitivity ($K_\text{\normalfont\tifinaghfont ⵟ} = 0$ at $\theta = \pi/2$). (b) Self-regulation property: response converges to $\|\mathbf{w}\|^2\cos^2\theta$ as radius $k \to \infty$, ensuring bounded outputs.
  • Figure 4: Comparison of standard Transformer block (left) and Aether block (right). In Aether-GPT2, ⵟ-multi-head attention replaces scaled dot-product attention, and an NMN layer replaces Linear+GeLU, eliminating activation functions and all LayerNorm operations.
  • Figure 5: Decision boundaries in 2D: linear (left) creates unbounded half-spaces; ⵟ-product (right) forms localized regions around prototypes (stars).
  • ...and 5 more figures

Theorems & Definitions (68)

  • Theorem 1: Mercer property of the ⵟ-product kernel
  • Theorem 2: Minimal Similarity and Statistical Orthogonality
  • Theorem 3: Maximal (Singular) Similarity
  • Corollary 1: Distributional Identity and KL
  • Theorem 4: Universal approximation with ⵟ-kernel
  • Proposition 1: Natural Self-Regulation
  • Proposition 2: Gradient Decay for Outliers
  • Theorem 5: Gradient Direction
  • Theorem 6: RKHS Existence
  • Remark 1: Generalization and Implicit Class Separation
  • ...and 58 more