Table of Contents
Fetching ...

GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

Michal Chudoba, Rudolf Rosa

TL;DR

The paper tackles the challenge of generating Czech poetry with controlled rhyme and meter by fine-tuning a Czech GPT-2 model on a large Czech verse corpus. It demonstrates that treating text as syllables or characters, rather than relying on subword units, improves adherence to formal constraints, and introduces a Forced Generation mechanism to enforce meter and rhyme at inference. Three input formats are explored to inject strophe and verse parameters, and four tokenization strategies are evaluated, with UNICODE/Forced Generation often delivering the best formal-quality results. Validators trained on Czech language models quantify rhyme, meter, and year accuracy, highlighting strong performance on rhyme and meter but limited reliability on year prediction, while showing the practical viability of form-focused poetry generation in morphologically rich languages.

Abstract

High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and metric aspects of formal quality of the generated poems.

GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

TL;DR

The paper tackles the challenge of generating Czech poetry with controlled rhyme and meter by fine-tuning a Czech GPT-2 model on a large Czech verse corpus. It demonstrates that treating text as syllables or characters, rather than relying on subword units, improves adherence to formal constraints, and introduces a Forced Generation mechanism to enforce meter and rhyme at inference. Three input formats are explored to inject strophe and verse parameters, and four tokenization strategies are evaluated, with UNICODE/Forced Generation often delivering the best formal-quality results. Validators trained on Czech language models quantify rhyme, meter, and year accuracy, highlighting strong performance on rhyme and meter but limited reliability on year prediction, while showing the practical viability of form-focused poetry generation in morphologically rich languages.

Abstract

High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and metric aspects of formal quality of the generated poems.
Paper Structure (42 sections, 10 figures, 9 tables)

This paper contains 42 sections, 10 figures, 9 tables.

Figures (10)

  • Figure 1: An ABAB strophe with meter annotation and rhythm annotation: 'x' = unstressed syllable, 'X' = stressed syllable. (Your ship is on the high seas, with a furrow in it like silver, she plunges her prow into the blue waves, and its side foaming into the rapids.)
  • Figure 2: Rhyme and meter presence
  • Figure 3: Year regions presence
  • Figure 4: Example of a strophe using the BASIC model input format.
  • Figure 5: Example of a strophe using the VERSE_PAR model input format with verse parameters.
  • ...and 5 more figures