Table of Contents
Fetching ...

Looking for the Inner Music: Probing LLMs' Understanding of Literary Style

Rebecca M. M. Hicke, David Mimno

TL;DR

This work investigates whether large language models can identify author and genre from very short literary passages, using two corpora (authorship: 81 novels by 27 authors; genre: 30 novels across five genres). It compares two model families (Flan-T5 variants and Llama-3-8b) against strong baselines and then probes the models with ablations and internal analyses (Shapley values, cross-attention, contextual embeddings) to characterize signals of style. The findings show that authorial style is easier to define than genre-level style, with pronoun usage and word order playing significant roles in both tasks, while stop words are not uniformly decisive. Memorization contributes to some models, notably Llama-3-8b, especially for popular authors, whereas others (e.g., Flan-T5-xl) rely more on representations learned during fine-tuning and generalizable stylistic cues. Overall, the paper advances a data-driven, interpretable view of literary style signals in LLMs and highlights differences in how models capture author versus genre style, with implications for stylometry and AI-assisted literary analysis.

Abstract

Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rely more on memorization, while others benefit more from training to learn author/genre characteristics. We then use three methods to probe one high-performing LLM for features that define style. These include direct syntactic ablations to input text as well as two methods that look at model internals. We find that authorial style is easier to define than genre-level style and is more impacted by minor syntactic decisions and contextual word usage. However, some traits like pronoun usage and word order prove significant for defining both kinds of literary style.

Looking for the Inner Music: Probing LLMs' Understanding of Literary Style

TL;DR

This work investigates whether large language models can identify author and genre from very short literary passages, using two corpora (authorship: 81 novels by 27 authors; genre: 30 novels across five genres). It compares two model families (Flan-T5 variants and Llama-3-8b) against strong baselines and then probes the models with ablations and internal analyses (Shapley values, cross-attention, contextual embeddings) to characterize signals of style. The findings show that authorial style is easier to define than genre-level style, with pronoun usage and word order playing significant roles in both tasks, while stop words are not uniformly decisive. Memorization contributes to some models, notably Llama-3-8b, especially for popular authors, whereas others (e.g., Flan-T5-xl) rely more on representations learned during fine-tuning and generalizable stylistic cues. Overall, the paper advances a data-driven, interpretable view of literary style signals in LLMs and highlights differences in how models capture author versus genre style, with implications for stylometry and AI-assisted literary analysis.

Abstract

Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rely more on memorization, while others benefit more from training to learn author/genre characteristics. We then use three methods to probe one high-performing LLM for features that define style. These include direct syntactic ablations to input text as well as two methods that look at model internals. We find that authorial style is easier to define than genre-level style and is more impacted by minor syntactic decisions and contextual word usage. However, some traits like pronoun usage and word order prove significant for defining both kinds of literary style.

Paper Structure

This paper contains 53 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: The size of each dataset used in the experiment by number of samples.
  • Figure 2: The overall accuracy of each model for (left) authorship attribution and (right) genre identification. Accuracy is separated into results for samples from novels included in training and samples from novels withheld from training. Results of a single run are reported and error bars represent the standard error bootstrapped over 1000 iterations. The y-axis is sorted by model's performance on samples from in-training novel.
  • Figure 3: The prompt used to probe llama-3-8b and flan-t5-xl for memorization of studied texts.
  • Figure 4: The prompt used to probe llama-3-8b and flan-t5-xl for internal representations of genre.
  • Figure 5: Confusion matrices of the responses of the prompted (left) and fine-tuned (right) llama-3-8b models for genre identification. Correct labels are represented by rows and model outputs are columns; labels produced outside of the correct set are ignored. The rows sum up to $\sim$100%.
  • ...and 8 more figures