Table of Contents
Fetching ...

What's the Magic Word? A Control Theory of LLM Prompting

Aman Bhargava, Cameron Witkowski, Shi-Zhuo Looi, Matt Thomson

TL;DR

The paper reframes prompting as a control problem for LLMs, formalizing the model as a discrete-time dynamical system and introducing the reachable-output-set as a core concept for controllability. A self-attention specific bound is derived, tying the reachability of a desired output to the singular values of the attention projection matrices and the length of the control input, $k$, with a computable condition for unreachability. Empirically, the authors demonstrate strong reachability for short prompts across multiple models (e.g., the "ground truth" next token reachable in ~97% with $k\leq 10$; top-75 reachable ~85–89%), while also showing that even unlikely tokens can be steered into higher-probability outputs with small prompts. The work provides a foundational, theory-informed lens on input sequence influence in LLMs, pointing to practical directions for more reliable and controllable prompt engineering and opening questions about scaling laws and cross-model behavior.

Abstract

Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We offer a mathematical analysis of the limitations on the controllability of self-attention as a function of the singular values of the parameter matrices. We present complementary empirical results on the controllability of a panel of LLMs, including Falcon-7b, Llama-7b, and Falcon-40b. Given initial state $\mathbf x_0$ from Wikitext and prompts of length $k \leq 10$ tokens, we find that the "correct" next token is reachable at least 97% of the time, and that the top 75 most likely next tokens are reachable at least 85% of the time. Intriguingly, short prompt sequences can dramatically alter the likelihood of specific outputs, even making the least likely tokens become the most likely ones. This control-theoretic analysis of LLMs demonstrates the significant and poorly understood role of input sequences in steering output probabilities, offering a foundational perspective for enhancing language model system capabilities.

What's the Magic Word? A Control Theory of LLM Prompting

TL;DR

The paper reframes prompting as a control problem for LLMs, formalizing the model as a discrete-time dynamical system and introducing the reachable-output-set as a core concept for controllability. A self-attention specific bound is derived, tying the reachability of a desired output to the singular values of the attention projection matrices and the length of the control input, , with a computable condition for unreachability. Empirically, the authors demonstrate strong reachability for short prompts across multiple models (e.g., the "ground truth" next token reachable in ~97% with ; top-75 reachable ~85–89%), while also showing that even unlikely tokens can be steered into higher-probability outputs with small prompts. The work provides a foundational, theory-informed lens on input sequence influence in LLMs, pointing to practical directions for more reliable and controllable prompt engineering and opening questions about scaling laws and cross-model behavior.

Abstract

Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We offer a mathematical analysis of the limitations on the controllability of self-attention as a function of the singular values of the parameter matrices. We present complementary empirical results on the controllability of a panel of LLMs, including Falcon-7b, Llama-7b, and Falcon-40b. Given initial state from Wikitext and prompts of length tokens, we find that the "correct" next token is reachable at least 97% of the time, and that the top 75 most likely next tokens are reachable at least 85% of the time. Intriguingly, short prompt sequences can dramatically alter the likelihood of specific outputs, even making the least likely tokens become the most likely ones. This control-theoretic analysis of LLMs demonstrates the significant and poorly understood role of input sequences in steering output probabilities, offering a foundational perspective for enhancing language model system capabilities.
Paper Structure (34 sections, 3 theorems, 25 equations, 10 figures, 3 algorithms)

This paper contains 34 sections, 3 theorems, 25 equations, 10 figures, 3 algorithms.

Key Result

Theorem 4.2

Consider a self-attention layer with input $\mathbf{X} \in \mathbb{R}^{m \times d}$ and control input $\mathbf{U} \in \mathbb{R}^{k \times d}$, where $m$ is the number of imposed tokens, $k$ is the number of control tokens, and $d$ is the token embedding dimension. Let $\mathbf{Y}^* \in \mathbb{R}^{ where $\sigma_v,\sigma_q$ and $\sigma_{\rm key}$ being the maximum singular values of the value, q

Figures (10)

  • Figure 1: Illustration of the control-theoretic approach to LLM prompt engineering. Left: the LLM system diagram mapping an initial state $\mathbf{x}_0$ to a system output $\mathbf{y}$ under the influence of a control input $\mathbf{u}$ (all token sequences). Right: sketch of the reachable output sets $R_y^k(\mathbf{x}_0)$ for varying control input lengths $k$.
  • Figure 2: Visualization of Theorem \ref{['thm:attention-control']}$\mathbf{Y}^*$ and components of $\mathbf{Y}_u, \mathbf{Y}_x$. If $\|\mathbf{Y}_{x,\perp}^{\max,i} \|$ exceeds $k\gamma$, then no prompt of length $\leq k$ can steer the self-attention to output $\mathbf{Y}^*$ given imposed $\mathbf{X}_0$ and constraints on $\|\mathbf{U}^i\| \leq M_u$.
  • Figure 3: Top Left: $k$-$\epsilon$ values on initial state $\mathbf{x}_0$ and target output token $y^*$ from Wikitext. 97.16% of the instances were solved with a prompt of length $k\leq 10$. Top Right: $k$-$\epsilon$ values reaching the top 75 most likely outputs $y^*$ for each $\mathbf{x}_0$ from Wikitext. The top 75 targets were reachable at least 89.39% of the time with a prompt of length $k\leq 10$. Bottom Left: Prior likelihood rank of target token $y^*$ versus required prompt length to elicit $y^*$. Target tokens were sampled uniformly from the least to most likely token given $\mathbf{x}_0$ sampled from Wikitext.
  • Figure 4: Log spaced main results of $k$-$\log(\epsilon)$ controllability. Interestingly, the relationship between $k$ and $\log(\epsilon)$ appears roughly linear for each question length in the regime studied. Top left: $k$-$\log(\epsilon)$ values for Falcon-7b. With $k=10$ control tokens, 97.16% of the target outputs were reachable. Top right: $k$-$\log(\epsilon)$ values for Llama-7b. With $k=10$ control tokens, 98.64% of the target outputs were reachable. Bottom right: $k$-$\log(\epsilon)$ values for Falcon-40b. With $k=10$ control tokens, 97.00% of the target outputs were reachable.
  • Figure 5: Required prompt length $k$ versus base loss on the target output $\mathcal{L} = -\log P_{LM}(y | \mathbf x_0)$ on "ground truth" wikitext target outputs $y$ directly proceeding $\mathbf x_0$. Top left: Falcon-7b. Top right: Llama-7b. Bottom right: Falcon-40b. While there does appear to be an "exclusion zone" in the top left-hand corner where a high prompt length is never associated with a base loss below a given threshold, base loss appears to be a poor predictor of required prompt length.
  • ...and 5 more figures

Theorems & Definitions (19)

  • Definition 3.1: LLM System with Control Input
  • Definition 3.2: LLM Output Reachability
  • Definition 3.3: LLM Reachable Output Set
  • Definition 3.4: LLM Output Controllability
  • Definition 3.5: $k$-$\epsilon$ Controllability
  • Definition 4.1: Self-Attention
  • Theorem 4.2: Self-Attention Control Theorem, proved in Appendix \ref{['app:proof-control-llms']}
  • Remark 4.3
  • Definition A.1: System
  • Definition A.2: State Reachability
  • ...and 9 more