What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
Liyi Zhang, Michael Y. Li, R. Thomas McCoy, Theodore R. Sumers, Jian-Qiao Zhu, Thomas L. Griffiths
TL;DR
This work proposes a Bayes-optimal framework for what embeddings learned by autoregressive LMs should represent, tying embeddings to predictive sufficient statistics. It formalizes three canonical data-generating settings—exchangeable data, latent state models, and discrete hypotheses—and shows, both analytically and via probing experiments, that transformers encode the corresponding latent distributions (e.g., suff stats, posterior over states, and topic mixtures). Through extensive synthetic and natural-corpus experiments (including Gaussian-Gamma, Beta-Bernoulli, HMM-LDA, and LDA-based topic models on 20NG and WikiText-103), the authors demonstrate that embeddings decoded by simple probes recover these quantities and generalize out-of-distribution without memorizing tokens. The findings offer a principled lens for interpretability and suggest design directions for LLM training and evaluation, notably in representing uncertainties and latent generative factors. Overall, the paper bridges Bayesian inference with deep autoregressive models and highlights predictive-sufficiency as a guiding principle for embedding content and downstream interpretability.
Abstract
Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of the syntax and semantics of language. But what should embeddings represent? We connect the autoregressive prediction objective to the idea of constructing predictive sufficient statistics to summarize the information contained in a sequence of observations, and use this connection to identify three settings where the optimal content of embeddings can be identified: independent identically distributed data, where the embedding should capture the sufficient statistics of the data; latent state models, where the embedding should encode the posterior distribution over states given the data; and discrete hypothesis spaces, where the embedding should reflect the posterior distribution over hypotheses given the data. We then conduct empirical probing studies to show that transformers encode these three kinds of latent generating distributions, and that they perform well in out-of-distribution cases and without token memorization in these settings.
