No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Taha Bouhsine

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Taha Bouhsine

Abstract

We introduce the yat-product, a kernel operator combining quadratic alignment with inverse-square proximity. We prove it is a Mercer kernel, analytic, Lipschitz on bounded domains, and self-regularizing, admitting a unique RKHS embedding. Neural Matter Networks (NMNs) use yat-product as the sole non-linearity, replacing conventional linear-activation-normalization blocks with a single geometrically-grounded operation. This architectural simplification preserves universal approximation while shifting normalization into the kernel itself via the denominator, rather than relying on separate normalization layers. Empirically, NMN-based classifiers match linear baselines on MNIST while exhibiting bounded prototype evolution and superposition robustness. In language modeling, Aether-GPT2 achieves lower validation loss than GPT-2 with a comparable parameter budget while using yat-based attention and MLP blocks. Our framework unifies kernel learning, gradient stability, and information geometry, establishing NMNs as a principled alternative to conventional neural architectures.

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Abstract

Paper Structure (87 sections, 40 theorems, 75 equations, 10 figures, 7 tables)

This paper contains 87 sections, 40 theorems, 75 equations, 10 figures, 7 tables.

Introduction
Methodology: A Framework for Geometry-Aware Computation
Neural Matter Network (NMN) Layers
Yat-Multi-Head Attention
Architectural Implementation
Results and Discussion
XOR Separability with a Single Unit
Decision Boundaries and Localization
MNIST Classification
Bounded Prototype Evolution.
Superposition Robustness.
Territorial Structure.
Extreme Classification Benchmark
Language Modeling: Aether-GPT2
Architectural Simplification.
...and 72 more sections

Key Result

Theorem 1

Let $\varepsilon>0$ and define Then for every compact set $K\subset\mathbb R^d$, the kernel $k_{\text{\normalfont\tifinaghfont ⵟ}}$ is symmetric, continuous, and positive definite on $K$. Consequently, $k_{\text{\normalfont\tifinaghfont ⵟ}}$ is a Mercer kernel on $K$.

Figures (10)

Figure 1: Comparison of the gradient field and vector field for Dot Product, Euclidean Distance, ⵟ-product, and Cosine Similarity (from left to right). The heatmaps illustrate how the ⵟ-product, unlike traditional similarity measures, creates a potential well around the weight vector $\mathbf{w}$, reflecting both alignment and proximity.
Figure 2: Potential well induced by a single ⵟ-neuron. High response occurs near $\mathbf{w}$ when inputs are both aligned and close, in contrast to unbounded linear hyperplanes.
Figure 3: (a) ⵟ-product response as a function of angle $\theta$ between $\mathbf{w}$ and $\mathbf{x}$, demonstrating orthogonality sensitivity ($K_\text{\normalfont\tifinaghfont ⵟ} = 0$ at $\theta = \pi/2$). (b) Self-regulation property: response converges to $\|\mathbf{w}\|^2\cos^2\theta$ as radius $k \to \infty$, ensuring bounded outputs.
Figure 4: Comparison of standard Transformer block (left) and Aether block (right). In Aether-GPT2, ⵟ-multi-head attention replaces scaled dot-product attention, and an NMN layer replaces Linear+GeLU, eliminating activation functions and all LayerNorm operations.
Figure 5: Decision boundaries in 2D: linear (left) creates unbounded half-spaces; ⵟ-product (right) forms localized regions around prototypes (stars).
...and 5 more figures

Theorems & Definitions (68)

Theorem 1: Mercer property of the ⵟ-product kernel
Theorem 2: Minimal Similarity and Statistical Orthogonality
Theorem 3: Maximal (Singular) Similarity
Corollary 1: Distributional Identity and KL
Theorem 4: Universal approximation with ⵟ-kernel
Proposition 1: Natural Self-Regulation
Proposition 2: Gradient Decay for Outliers
Theorem 5: Gradient Direction
Theorem 6: RKHS Existence
Remark 1: Generalization and Implicit Class Separation
...and 58 more

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Abstract

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (68)