Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

O. V. Usatenko; S. S. Melnyk; G. M. Pritula

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

O. V. Usatenko, S. S. Melnyk, G. M. Pritula

TL;DR

A theoretically feasible approximation of LLM dynamics using N-order additive Markov chains, which allows the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes.

Abstract

Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov chains.

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

TL;DR

Abstract

Paper Structure (17 sections, 40 equations, 3 figures)

This paper contains 17 sections, 40 equations, 3 figures.

Introduction
Background: Basic definitions and concepts
Symbolic $N$-order Markov chains and their models
Additive $N$-order Markov chain
Chapman-Kolmogorov equation
Markov and two-sided random chains equivalence
The source entropy
Macroscopic parameters of the Markov chains
Equivalence between additive and step-wise Markov chains models
Basic results on information temperature
Temperature from equivalence of the Ising and Markov chains
Temperature from the entropy
Ansatz
The temperature of the additive chain
Numerical simulations
...and 2 more sections

Figures (3)

Figure 1: The correlation function $K(r)$ of additive Markov chain constructed using the memory function $F(r)$, Eq. \ref{['eqmf']} (shown in the inset) with memory length $r=N=10$ and parameters $\overline{a}=1/2$ and $F_0=0.15$. The solid line represents the numerical solution of equation \ref{['KorrBin']}. The dots represent the calculations by definition \ref{['KorrDef']} of generating a numerical sequence with CPDF \ref{['CondPr_power']}.
Figure 2: The dependence of inverse temperature $\tau ^{-1}$ defined by Eqs. \ref{['All_tau']} and \ref{['Mu2']} for the additive Markov chains with CPDF Eq. \ref{['CondPr_power1']} and memory function \ref{['eqmf']} for $N=5,\,8,\,20$ (the corresponding lines are marked in the legend). The values of parameter $F_0$ when the inverse temperature goes asymptotically to infinity are determined by conditions \ref{['def_ergod']}, i.e., $|F_0| \sum_{r=1}^N \left(1 - \dfrac{r}{N}\right)=1.$
Figure 3: The lower curve is the dependence of conditional entropies defined by Eqs. \ref{['entro_block']} and \ref{['ShennEntr']} for the additive Markov chains with CPDF Eq. \ref{['CondPr_power1']} and memory function \ref{['eqmf']} with $N=10$ and $F_0 = 0.15$. The calculated parameter $\mu = 0.345$, defined by the equation \ref{['Mu2']}, gives the entropy of the step-wise chain represented by the upper curve.

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

TL;DR

Abstract

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)