What's in a prompt? Language models encode literary style in prompt embeddings
Raphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher Earls
TL;DR
This study probes how prompts are encoded in large language models, revealing that abstract properties like literary style and authorship become embedded in deep transformer representations rather than in initial word embeddings. By tracking the last-token embeddings across layers for short passages and applying SVM and MLP classifiers, the authors show that authorship can be inferred with high accuracy as context length $N$ and layer depth $L$ increase, and that stylistic signals reside in a small subspace aligned with the largest principal components. The findings demonstrate that intangible prompt features are compressed and organized within the model's latent geometry, with cross-language similarities indicating shared stylistic directions across languages. These insights have implications for authorship attribution, literary analysis, and the broader interpretability of how prompts shape world models in LLMs.
Abstract
Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.
