Unsigned Play by Milan Kundera? An Authorship Attribution Study
Lenka Jungmannová, Petr Plecháč
TL;DR
The paper tackles the contested authorship of Juro Jánošík, a play long attributed to Karel Steigerwald, by applying stylometric authorship attribution. It constructs a nine-document corpus spanning Kundera’s plays, Steigerwald’s works, and a radio piece, processes texts with OCR corrections and lemmatization, and samples them into 2000-word units; features include word, lemma, and character-based signals analyzed via hierarchical cosine delta clustering and SVM across 16 configurations with 1,000 cross-validation rounds. The results show that Jánošík is consistently attributed to Kundera across all runs, with robustness checks supporting the finding and no evidence of misattribution due to genre or training data differences. The study strengthens the claim that Kundera authored Jánošík and highlights the potential for undisclosed works published under pseudonyms, with data and code publicly accessible for replication.
Abstract
In addition to being a widely recognised novelist, Milan Kundera has also authored three pieces for theatre: The Owners of the Keys (Majitelé klíčů, 1961), The Blunder (Ptákovina, 1967), and Jacques and his Master (Jakub a jeho pán, 1971). In recent years, however, the hypothesis has been raised that Kundera is the true author of a fourth play: Juro Jánošík, first performed in a 1974 production under the name of Karel Steigerwald, who was Kundera's student at the time. In this study, we make use of supervised machine learning to settle the question of authorship attribution in the case of Juro Jánošík, with results strongly supporting the hypothesis of Kundera's authorship.
