Provably Learning from Modern Language Models via Low Logit Rank
Noah Golowich, Allen Liu, Abhishek Shetty
TL;DR
The paper addresses the theoretical understanding of modern language models by leveraging the empirical observation of approximately low logit rank in logit matrices. It introduces a formal framework for approximate low logit rank and proves a polynomial-time learning algorithm that uses logit queries to recover a model close in total variation to the target. The core innovations are a two-part technical approach: (i) adaptive selection of futures via the elliptical potential lemma to cope with high-dimensional logit rows, and (ii) sampling via a linear-programming-based representation that bounds coefficient growth and yields an efficient sampler. The results provide the first end-to-end provable learning guarantees for a generative model that plausibly captures key aspects of modern language models, and they open avenues for further integration with practical training, online data, and interpretability. The work also connects to broader themes in learning theory, such as Kushilevitz-Mansour-style results for Boolean functions, under the low logit-rank paradigm.
Abstract
While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank. Roughly, this means that a matrix formed by the model's log probabilities of various tokens conditioned on certain sequences of tokens is well approximated by a low rank matrix. In this paper, our focus is on understanding how this structure can be exploited algorithmically for obtaining provable learning guarantees. Since low logit rank models can encode hard-to-learn distributions such as noisy parities, we study a query learning model with logit queries that reflects the access model for common APIs. Our main result is an efficient algorithm for learning any approximately low logit rank model from queries. We emphasize that our structural assumption closely reflects the behavior that is empirically observed in modern language models. Thus, our result gives what we believe is the first end-to-end learning guarantee for a generative model that plausibly captures modern language models.
