PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

Burc Gokden

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

Burc Gokden

TL;DR

This work introduces PLDR-LLM, a foundational model whose deductive outputs are captured by a tensor trio $(A_{LM}, A_{P}, G_{LM})$ that define the attention mechanism. It demonstrates that a learned, input-invariant tensor operator $G_{LM}$ can replace the deep PLGA network at inference, enabling straightforward caching strategies and maintaining nearly identical inductive outputs with small perturbations. The study provides extensive ablations comparing learnable, predefined, and random tensor operators, shows SDPA as a special case when $G_{LM}$ is identity, and reports competitive zero-shot benchmarks alongside substantial inference-time speedups from KV-cache and G-cache. The results imply a fundamental training–inference asymmetry and suggest that the learned singularity of the deductive outputs yields a robust, generalizable operator that can serve as a cache-friendly backbone for future language models, with practical implications for efficient inference on large-scale deployments.

Abstract

We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor $\mathbf{G}_{LM}$ to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for $\mathbf{G}_{LM}$ (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where $\mathbf{G}_{LM}$ is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

TL;DR

This work introduces PLDR-LLM, a foundational model whose deductive outputs are captured by a tensor trio

that define the attention mechanism. It demonstrates that a learned, input-invariant tensor operator

can replace the deep PLGA network at inference, enabling straightforward caching strategies and maintaining nearly identical inductive outputs with small perturbations. The study provides extensive ablations comparing learnable, predefined, and random tensor operators, shows SDPA as a special case when

is identity, and reports competitive zero-shot benchmarks alongside substantial inference-time speedups from KV-cache and G-cache. The results imply a fundamental training–inference asymmetry and suggest that the learned singularity of the deductive outputs yields a robust, generalizable operator that can serve as a cache-friendly backbone for future language models, with practical implications for efficient inference on large-scale deployments.

Abstract

to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for

(G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where

is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

TL;DR

Abstract

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)