Table of Contents
Fetching ...

What's in a prompt? Language models encode literary style in prompt embeddings

Raphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher Earls

TL;DR

This study probes how prompts are encoded in large language models, revealing that abstract properties like literary style and authorship become embedded in deep transformer representations rather than in initial word embeddings. By tracking the last-token embeddings across layers for short passages and applying SVM and MLP classifiers, the authors show that authorship can be inferred with high accuracy as context length $N$ and layer depth $L$ increase, and that stylistic signals reside in a small subspace aligned with the largest principal components. The findings demonstrate that intangible prompt features are compressed and organized within the model's latent geometry, with cross-language similarities indicating shared stylistic directions across languages. These insights have implications for authorship attribution, literary analysis, and the broader interpretability of how prompts shape world models in LLMs.

Abstract

Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.

What's in a prompt? Language models encode literary style in prompt embeddings

TL;DR

This study probes how prompts are encoded in large language models, revealing that abstract properties like literary style and authorship become embedded in deep transformer representations rather than in initial word embeddings. By tracking the last-token embeddings across layers for short passages and applying SVM and MLP classifiers, the authors show that authorship can be inferred with high accuracy as context length and layer depth increase, and that stylistic signals reside in a small subspace aligned with the largest principal components. The findings demonstrate that intangible prompt features are compressed and organized within the model's latent geometry, with cross-language similarities indicating shared stylistic directions across languages. These insights have implications for authorship attribution, literary analysis, and the broader interpretability of how prompts shape world models in LLMs.

Abstract

Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.

Paper Structure

This paper contains 29 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: After semantic embedding of the prompt, vectors represent a single word. As the prompt passes through transformer layers, the attention mechanism funnels more and more information about preceding tokens into the last embedding -- turning it into a 'chimera' vector, encoding bits of information from all others.
  • Figure 2: (A) Ensembles of short excerpts ($N=64$ tokens) from GE and VW separate in the latent space as embeddings travel through successive transformer layers. (B) Linear classifier accuracy (%) to distinguish GE vs VW ensembles as a function of prompt's number of tokens $N$ and number of transformer layers crossed $L$.
  • Figure 3: (A) Accuracy (%) of an MLP probe to distinguish passages from 13 different books ($N = 128, L = 16$). See Tab. \ref{['tab:authors']} for the list of authors and novels. Cyan squares emphasize novels from the same authors. It is noteworthy that confusion increases between books of the same author, even though they relate to different topics. (B) Results specific to probe confusion for books from the same author (intra) or a different author (extra).
  • Figure 4: Dimensionality of stylistic features. (A) Probe accuracy (%) for classifying GE vs. VW ensembles projected onto PCA subspaces spanned by $\{\vec{u}_k, \dots, \vec{u}_{k+n-1} \}$, where $\vec{u}_k$ is the $k$-th principal component and $n$ is the subspace dimension (B) Intrinsic dimension for embedding ensembles as a function of context length $N$. ID is calculated using the TwoNN method described in valeriani2023geometryhiddenrepresentationslarge.
  • Figure 5: Map of style: low-dimensional visualization of the high-dimensional geometry across books and authors. Text chunks ($N=128, L=16$) are UMAP embedded from their 32-dimensional activations extracted at the penultimate layer of the MLP classifier of Fig. \ref{['fig:classifier']}. We note the substantial overlap between excerpts from the same author, e.g., Austen (JA1, JA2, JA3) or Wolf (VW1, VW2). More comments in Appendix \ref{['app:analysis']}.
  • ...and 2 more figures