Computing the Formal and Institutional Boundaries of Contemporary Genre and Literary Fiction
Natasha Johnson
TL;DR
This study interrogates whether contemporary genre classifications are primarily formal features or socially constructed institutions, and how author gender modulates these boundaries. Using Piper's CONLIT corpus, it combines Welch's ANOVA, logistic regression with gender interactions, and distance-based analyses on stylistic and semantic representations (raw unigrams and embeddings) to compare genre fiction (SF, mystery, romance) with literary fiction. The findings reveal robust formal markers distinguishing categories, while also showing that female authorship can narrow or blur literary prestige and alter how features map to classification. Together, the work advances empirical understanding of genre boundaries, with implications for classification, publishing, and gender dynamics in twenty-first-century Anglophone fiction.
Abstract
Though the concept of genre has been a subject of discussion for millennia, the relatively recent emergence of genre fiction has added a new layer to this ongoing conversation. While more traditional perspectives on genre have emphasized form, contemporary scholarship has invoked both formal and institutional characteristics in its taxonomy of genre, genre fiction, and literary fiction. This project uses computational methods to explore the soundness of genre as a formal designation as opposed to an institutional one. Pulling from Andrew Piper's CONLIT dataset of Contemporary Literature, we assemble a corpus of literary and genre fiction, with the latter category containing romance, mystery, and science fiction novels. We use Welch's ANOVA to compare the distribution of narrative features according to author gender within each genre and within genre versus literary fiction. Then, we use logistic regression to model the effect that each feature has on literary classification and to measure how author gender moderates these effects. Finally, we analyze stylistic and semantic vector representations of our genre categories to understand the importance of form and content in literary classification. This project finds statistically significant formal markers of each literary category and illustrates how female authorship narrows and blurs the target for achieving literary status.
