Table of Contents
Fetching ...

PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs

Zhan Qu, Shuzhou Yuan, Michael Färber

TL;DR

This work tackles constrained generation of Songci by Cipai templates, a regime requiring strict structural, tonal, and rhyming rules. It introduces PoeTone, a complete pipeline with a Cipai constraint resource, a thematic canonical corpus, and a multi-faceted evaluation protocol that blends formal conformity, automated quality, human judgment, and probing. The authors benchmark 18 LLMs across five prompting strategies and propose a Generate-Critic architecture that uses automated rule-based feedback to guide fine-tuning, achieving up to 5.88% improvement in formal conformity for open-source models. The study provides actionable insights into the strengths and limitations of current LLMs for culturally significant, formally constrained text and offers a scalable approach for aligning models with symbolic, rule-based goals in structured domains.

Abstract

This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing Songci, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across 4 families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-based, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the critic's feedback as a scoring function for best-of-N selection, we fine-tune 3 lightweight open-source LLMs via supervised fine-tuning (SFT), resulting in improvements of up to 5.88% in formal conformity. Our findings offer new insights into the generative strengths and limitations of LLMs in producing culturally significant and formally constrained literary texts.

PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs

TL;DR

This work tackles constrained generation of Songci by Cipai templates, a regime requiring strict structural, tonal, and rhyming rules. It introduces PoeTone, a complete pipeline with a Cipai constraint resource, a thematic canonical corpus, and a multi-faceted evaluation protocol that blends formal conformity, automated quality, human judgment, and probing. The authors benchmark 18 LLMs across five prompting strategies and propose a Generate-Critic architecture that uses automated rule-based feedback to guide fine-tuning, achieving up to 5.88% improvement in formal conformity for open-source models. The study provides actionable insights into the strengths and limitations of current LLMs for culturally significant, formally constrained text and offers a scalable approach for aligning models with symbolic, rule-based goals in structured domains.

Abstract

This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing Songci, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across 4 families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-based, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the critic's feedback as a scoring function for best-of-N selection, we fine-tune 3 lightweight open-source LLMs via supervised fine-tuning (SFT), resulting in improvements of up to 5.88% in formal conformity. Our findings offer new insights into the generative strengths and limitations of LLMs in producing culturally significant and formally constrained literary texts.

Paper Structure

This paper contains 24 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The multi-layered challenge of generating high-quality Songci, highlighting the formal constraints of the Cipai as the primary bottleneck.
  • Figure 2: An overview of our research pipeline, from framework development and benchmarking to model enhancement.
  • Figure 3: Expected formal improvements in generated Songci before and after fine-tuning.