Markov Constraint as Large Language Model Surrogate

Alexandre Bonlarron; Jean-Charles Régin

Markov Constraint as Large Language Model Surrogate

Alexandre Bonlarron, Jean-Charles Régin

TL;DR

This paper introduces the NgramMarkov constraint, a CP-friendly surrogate for Large Language Models that uses a set of n-grams with probabilities provided by an LLM to guide constrained text generation. By replacing direct LLM calls with a log-probability bound across a sequence and employing multiple filtering strategies (Instant, Final, Gliding, and Look-a-head), the approach reduces combinatorial explosion and enables efficient 4-gram and 5-gram generation. Empirical results show substantial pruning of candidate sentences, practical n-gram scoring times around 10 ms on a lightweight French GPT-2 setup, and perplexity-aligned quality trends, though some good solutions may be pruned in higher-order n-grams. The work bridges CP and modern LLMs, offering a scalable method for context-aware, constrained text generation with potential for interactive creativity tools.

Abstract

This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.

Markov Constraint as Large Language Model Surrogate

TL;DR

Abstract

Paper Structure (31 sections, 12 equations, 3 figures, 2 tables)

This paper contains 31 sections, 12 equations, 3 figures, 2 tables.

Introduction
Preliminaries
LM Constraints
Language Model
N-gram Model
Large Language Model
Method
Filtering Criteria of N-grams
Transition Filtering: Instant Threshold
Path Filtering: Final Threshold
Prefix-based Filtering: Gliding Threshold
Look-a-head Filtering
Results
Experimental Conditions
Application:
...and 16 more sections

Figures (3)

Figure 1: This figure illustrates a Markov Process in a simplified case with several 3-grams. Where each state is a 3-gram (e.g., $n_2=w_2w_3w_4$) and each transition between state is a word (e.g., $w_5$). It highlights each filtering criteria of n-gram based on their LogProb: (1) In red, the instant Threshold (renamed $T_i$, to avoid confusion with the final one $T$), for each possible word transition. (2) Then, in blue , the final threshold, checks only the total sum of logprob is under the threshold $T$. (3) Next, in green, the gliding threshold integrates the final threshold at each transition step, checking if the partial sum at step $k$ falls under $k * \frac{T}{|X|}$. Finally, the various dashed-lines indicates where the constraints check are called
Figure 2: This figure draws the distribution of 4-grams (a) and 5-grams (b) extracted from french books binded with log-prob computed from a light french GPT model$^3$ (a,b) and their associated Quantile-Quartile plot (QQplot) respectively (a,c) and (b,d).
Figure 3: Comparisons of PPL distribution of sentences produced in 5-grams between: gliding threshold (blue) for $\lambda=1$, look-ahead threshold for $\lambda=1$ (green) and vanilla model (red).

Theorems & Definitions (1)

Definition 1

Markov Constraint as Large Language Model Surrogate

TL;DR

Abstract

Markov Constraint as Large Language Model Surrogate

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)