Table of Contents
Fetching ...

Unsigned Play by Milan Kundera? An Authorship Attribution Study

Lenka Jungmannová, Petr Plecháč

TL;DR

The paper tackles the contested authorship of Juro Jánošík, a play long attributed to Karel Steigerwald, by applying stylometric authorship attribution. It constructs a nine-document corpus spanning Kundera’s plays, Steigerwald’s works, and a radio piece, processes texts with OCR corrections and lemmatization, and samples them into 2000-word units; features include word, lemma, and character-based signals analyzed via hierarchical cosine delta clustering and SVM across 16 configurations with 1,000 cross-validation rounds. The results show that Jánošík is consistently attributed to Kundera across all runs, with robustness checks supporting the finding and no evidence of misattribution due to genre or training data differences. The study strengthens the claim that Kundera authored Jánošík and highlights the potential for undisclosed works published under pseudonyms, with data and code publicly accessible for replication.

Abstract

In addition to being a widely recognised novelist, Milan Kundera has also authored three pieces for theatre: The Owners of the Keys (Majitelé klíčů, 1961), The Blunder (Ptákovina, 1967), and Jacques and his Master (Jakub a jeho pán, 1971). In recent years, however, the hypothesis has been raised that Kundera is the true author of a fourth play: Juro Jánošík, first performed in a 1974 production under the name of Karel Steigerwald, who was Kundera's student at the time. In this study, we make use of supervised machine learning to settle the question of authorship attribution in the case of Juro Jánošík, with results strongly supporting the hypothesis of Kundera's authorship.

Unsigned Play by Milan Kundera? An Authorship Attribution Study

TL;DR

The paper tackles the contested authorship of Juro Jánošík, a play long attributed to Karel Steigerwald, by applying stylometric authorship attribution. It constructs a nine-document corpus spanning Kundera’s plays, Steigerwald’s works, and a radio piece, processes texts with OCR corrections and lemmatization, and samples them into 2000-word units; features include word, lemma, and character-based signals analyzed via hierarchical cosine delta clustering and SVM across 16 configurations with 1,000 cross-validation rounds. The results show that Jánošík is consistently attributed to Kundera across all runs, with robustness checks supporting the finding and no evidence of misattribution due to genre or training data differences. The study strengthens the claim that Kundera authored Jánošík and highlights the potential for undisclosed works published under pseudonyms, with data and code publicly accessible for replication.

Abstract

In addition to being a widely recognised novelist, Milan Kundera has also authored three pieces for theatre: The Owners of the Keys (Majitelé klíčů, 1961), The Blunder (Ptákovina, 1967), and Jacques and his Master (Jakub a jeho pán, 1971). In recent years, however, the hypothesis has been raised that Kundera is the true author of a fourth play: Juro Jánošík, first performed in a 1974 production under the name of Karel Steigerwald, who was Kundera's student at the time. In this study, we make use of supervised machine learning to settle the question of authorship attribution in the case of Juro Jánošík, with results strongly supporting the hypothesis of Kundera's authorship.
Paper Structure (5 sections, 1 figure, 2 tables)

This paper contains 5 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Dendrograms based on different feature sets and different levels of Most Frequent Types (cosine distance, complete linkage)