What's the Magic Word? A Control Theory of LLM Prompting
Aman Bhargava, Cameron Witkowski, Shi-Zhuo Looi, Matt Thomson
TL;DR
The paper reframes prompting as a control problem for LLMs, formalizing the model as a discrete-time dynamical system and introducing the reachable-output-set as a core concept for controllability. A self-attention specific bound is derived, tying the reachability of a desired output to the singular values of the attention projection matrices and the length of the control input, $k$, with a computable condition for unreachability. Empirically, the authors demonstrate strong reachability for short prompts across multiple models (e.g., the "ground truth" next token reachable in ~97% with $k\leq 10$; top-75 reachable ~85–89%), while also showing that even unlikely tokens can be steered into higher-probability outputs with small prompts. The work provides a foundational, theory-informed lens on input sequence influence in LLMs, pointing to practical directions for more reliable and controllable prompt engineering and opening questions about scaling laws and cross-model behavior.
Abstract
Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We offer a mathematical analysis of the limitations on the controllability of self-attention as a function of the singular values of the parameter matrices. We present complementary empirical results on the controllability of a panel of LLMs, including Falcon-7b, Llama-7b, and Falcon-40b. Given initial state $\mathbf x_0$ from Wikitext and prompts of length $k \leq 10$ tokens, we find that the "correct" next token is reachable at least 97% of the time, and that the top 75 most likely next tokens are reachable at least 85% of the time. Intriguingly, short prompt sequences can dramatically alter the likelihood of specific outputs, even making the least likely tokens become the most likely ones. This control-theoretic analysis of LLMs demonstrates the significant and poorly understood role of input sequences in steering output probabilities, offering a foundational perspective for enhancing language model system capabilities.
