Table of Contents
Fetching ...

Computing the Formal and Institutional Boundaries of Contemporary Genre and Literary Fiction

Natasha Johnson

TL;DR

This study interrogates whether contemporary genre classifications are primarily formal features or socially constructed institutions, and how author gender modulates these boundaries. Using Piper's CONLIT corpus, it combines Welch's ANOVA, logistic regression with gender interactions, and distance-based analyses on stylistic and semantic representations (raw unigrams and embeddings) to compare genre fiction (SF, mystery, romance) with literary fiction. The findings reveal robust formal markers distinguishing categories, while also showing that female authorship can narrow or blur literary prestige and alter how features map to classification. Together, the work advances empirical understanding of genre boundaries, with implications for classification, publishing, and gender dynamics in twenty-first-century Anglophone fiction.

Abstract

Though the concept of genre has been a subject of discussion for millennia, the relatively recent emergence of genre fiction has added a new layer to this ongoing conversation. While more traditional perspectives on genre have emphasized form, contemporary scholarship has invoked both formal and institutional characteristics in its taxonomy of genre, genre fiction, and literary fiction. This project uses computational methods to explore the soundness of genre as a formal designation as opposed to an institutional one. Pulling from Andrew Piper's CONLIT dataset of Contemporary Literature, we assemble a corpus of literary and genre fiction, with the latter category containing romance, mystery, and science fiction novels. We use Welch's ANOVA to compare the distribution of narrative features according to author gender within each genre and within genre versus literary fiction. Then, we use logistic regression to model the effect that each feature has on literary classification and to measure how author gender moderates these effects. Finally, we analyze stylistic and semantic vector representations of our genre categories to understand the importance of form and content in literary classification. This project finds statistically significant formal markers of each literary category and illustrates how female authorship narrows and blurs the target for achieving literary status.

Computing the Formal and Institutional Boundaries of Contemporary Genre and Literary Fiction

TL;DR

This study interrogates whether contemporary genre classifications are primarily formal features or socially constructed institutions, and how author gender modulates these boundaries. Using Piper's CONLIT corpus, it combines Welch's ANOVA, logistic regression with gender interactions, and distance-based analyses on stylistic and semantic representations (raw unigrams and embeddings) to compare genre fiction (SF, mystery, romance) with literary fiction. The findings reveal robust formal markers distinguishing categories, while also showing that female authorship can narrow or blur literary prestige and alter how features map to classification. Together, the work advances empirical understanding of genre boundaries, with implications for classification, publishing, and gender dynamics in twenty-first-century Anglophone fiction.

Abstract

Though the concept of genre has been a subject of discussion for millennia, the relatively recent emergence of genre fiction has added a new layer to this ongoing conversation. While more traditional perspectives on genre have emphasized form, contemporary scholarship has invoked both formal and institutional characteristics in its taxonomy of genre, genre fiction, and literary fiction. This project uses computational methods to explore the soundness of genre as a formal designation as opposed to an institutional one. Pulling from Andrew Piper's CONLIT dataset of Contemporary Literature, we assemble a corpus of literary and genre fiction, with the latter category containing romance, mystery, and science fiction novels. We use Welch's ANOVA to compare the distribution of narrative features according to author gender within each genre and within genre versus literary fiction. Then, we use logistic regression to model the effect that each feature has on literary classification and to measure how author gender moderates these effects. Finally, we analyze stylistic and semantic vector representations of our genre categories to understand the importance of form and content in literary classification. This project finds statistically significant formal markers of each literary category and illustrates how female authorship narrows and blurs the target for achieving literary status.

Paper Structure

This paper contains 32 sections, 8 equations, 11 figures, 20 tables.

Figures (11)

  • Figure 1: Plots depicting the probability of literary classification relative to feature usage for features with a significant impact ($p<0.05$) on model results. Note that the only the interaction effect which is significant ($p<0.05$) at a 95% confidence interval is the interaction effect between author gender and average word length.
  • Figure 2: PCA projections of books based on raw-unigram (left) and static-embedding vector representations (right). Each point corresponds to a book. Color indicates genre category.
  • Figure 3: Character histograms for genre fiction by author gender
  • Figure 4: Narrative structure histograms for genre fiction by author gender
  • Figure 5: Language histograms for genre fiction by author gender
  • ...and 6 more figures