Analyzing constrained LLM through PDFA-learning

Matías Carrasco; Franz Mayr; Sergio Yovine; Johny Kidd; Martín Iturbide; Juan Pedro da Silva; Alejo Garat

Analyzing constrained LLM through PDFA-learning

Matías Carrasco, Franz Mayr, Sergio Yovine, Johny Kidd, Martín Iturbide, Juan Pedro da Silva, Alejo Garat

TL;DR

An algorithm is developed for efficiently learning the quotient with respect to this congruence that copes with null next-symbol probabilities that arise when the output of a language model is constrained by some means during text generation.

Abstract

We define a congruence that copes with null next-symbol probabilities that arise when the output of a language model is constrained by some means during text generation. We develop an algorithm for efficiently learning the quotient with respect to this congruence and evaluate it on case studies for analyzing statistical properties of LLM.

Analyzing constrained LLM through PDFA-learning

TL;DR

Abstract

Paper Structure (19 sections, 7 theorems, 19 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 7 theorems, 19 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Language models
Sampling
Learning algorithm
Performance experiments
Analyzing large language models
Guiding generation
Learning
Tokenizers
Case study 1
Case study 2
Conclusions
Acknowledgements
Proof of Proposition \ref{['prop:indist_new']}
Proof of Proposition \ref{['prop:congruence_stasim']}
...and 4 more sections

Key Result

Proposition 2.1

For all $u,v\in\Sigma^\ast.\ u \equiv v$ if and only if Proof. See Appendix proof:prop_indist_new.

Figures (9)

Figure 1: PDFA $\mathcal{A}$ (left) and $\mathcal{B}$ (right) over $\Sigma = \{a, b\}$ with $q_{\mathrm{in}} = q_0$.
Figure 2: Difference between $\equiv^\bullet_{E}$ and $\equiv_{E}$
Figure 3: Running time curves: (left) As function of $\theta$ (right) As function of $n$
Figure 4: Synchronization: (left) $\mathcal{L}$ (center) $\mathcal{G}$ (right) $\mathcal{B} = \mathsf{samptop}_{2}(\mathcal{L}\times\mathcal{G})$
Figure 5: Distributions of floats and the lengths of their representing strings (digit sampling).
...and 4 more figures

Theorems & Definitions (13)

Proposition 2.1
Proposition 2.2
Proposition 2.3
Corollary 2.1
Proposition 2.4
Proposition 3.1
proof
proof
proof
proof
...and 3 more

Analyzing constrained LLM through PDFA-learning

TL;DR

Abstract

Analyzing constrained LLM through PDFA-learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)