Table of Contents
Fetching ...

Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

Triet Minh Huynh, Quan Le Bao

TL;DR

This work tackles Vietnamese poetry generation with large language models using a prompt-driven approach, enabling genre- and sentiment-controlled outputs. It compares GPT-3 variants and BLOOM, applying 8-bit quantization and LoRa to fit computational constraints, and introduces two generation pipelines: text-to-poem and poem-to-poem, the latter enabling cross-language poem translation. A custom scoring system based on length, tone, and rhyme filters high-quality poems from a large Vietnamese corpus, with luc bat as a focal genre. The strongest result—Babbage fine-tuned on the full luc bat dataset—achieves a score of $0.805$ for luc bat and demonstrates the viability of cross-language poem-to-poem translation, suggesting scalable, controllable Vietnamese poetry generation with LLMs.

Abstract

Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content.

Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

TL;DR

This work tackles Vietnamese poetry generation with large language models using a prompt-driven approach, enabling genre- and sentiment-controlled outputs. It compares GPT-3 variants and BLOOM, applying 8-bit quantization and LoRa to fit computational constraints, and introduces two generation pipelines: text-to-poem and poem-to-poem, the latter enabling cross-language poem translation. A custom scoring system based on length, tone, and rhyme filters high-quality poems from a large Vietnamese corpus, with luc bat as a focal genre. The strongest result—Babbage fine-tuned on the full luc bat dataset—achieves a score of for luc bat and demonstrates the viability of cross-language poem-to-poem translation, suggesting scalable, controllable Vietnamese poetry generation with LLMs.

Abstract

Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content.
Paper Structure (14 sections, 9 equations, 3 figures, 1 table)

This paper contains 14 sections, 9 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The text-to-poem pipeline
  • Figure 2: The poem-to-poem pipeline
  • Figure 3: Result comparison graph