Table of Contents
Fetching ...

Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing

Tuhin Chakrabarty, Vishakh Padmakumar, He He

TL;DR

<3-5 sentence high-level summary> CoPoet investigates instruction-tuning as a vehicle for collaborative poetry writing, training a poetry-focused model on a large corpus of instruction–line pairs and evaluating its ability to follow complex, compositional constraints. The study demonstrates strong automatic and human evaluation performance for large finetuned models (notably T5-11B-poem), and shows that humans can effectively collaborate with the system to produce diverse, coherent poems across topics. In a user study, experts using CoPoet produced poems that were preferred by third-party evaluators compared with solo-written poems, with substantial model contribution in many cases. The work highlights the potential of natural-language instructions as a robust interface for human–AI co-creative writing and provides insights into data design, model scaling, and interface considerations for collaborative poetry tasks.

Abstract

Recent work in training large language models (LLMs) to follow natural language instructions has opened up exciting opportunities for natural language interface design. Building on the prior success of LLMs in the realm of computer-assisted creativity, we aim to study if LLMs can improve the quality of user-generated content through collaboration. We present CoPoet, a collaborative poetry writing system. In contrast to auto-completing a user's text, CoPoet is controlled by user instructions that specify the attributes of the desired text, such as Write a sentence about `love' or Write a sentence ending in `fly'. The core component of our system is a language model fine-tuned on a diverse collection of instructions for poetry writing. Our model is not only competitive with publicly available LLMs trained on instructions (InstructGPT), but is also capable of satisfying unseen compositional instructions. A study with 15 qualified crowdworkers shows that users successfully write poems with CoPoet on diverse topics ranging from Monarchy to Climate change. Further, the collaboratively written poems are preferred by third-party evaluators over those written without the system.

Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing

TL;DR

<3-5 sentence high-level summary> CoPoet investigates instruction-tuning as a vehicle for collaborative poetry writing, training a poetry-focused model on a large corpus of instruction–line pairs and evaluating its ability to follow complex, compositional constraints. The study demonstrates strong automatic and human evaluation performance for large finetuned models (notably T5-11B-poem), and shows that humans can effectively collaborate with the system to produce diverse, coherent poems across topics. In a user study, experts using CoPoet produced poems that were preferred by third-party evaluators compared with solo-written poems, with substantial model contribution in many cases. The work highlights the potential of natural-language instructions as a robust interface for human–AI co-creative writing and provides insights into data design, model scaling, and interface considerations for collaborative poetry tasks.

Abstract

Recent work in training large language models (LLMs) to follow natural language instructions has opened up exciting opportunities for natural language interface design. Building on the prior success of LLMs in the realm of computer-assisted creativity, we aim to study if LLMs can improve the quality of user-generated content through collaboration. We present CoPoet, a collaborative poetry writing system. In contrast to auto-completing a user's text, CoPoet is controlled by user instructions that specify the attributes of the desired text, such as Write a sentence about `love' or Write a sentence ending in `fly'. The core component of our system is a language model fine-tuned on a diverse collection of instructions for poetry writing. Our model is not only competitive with publicly available LLMs trained on instructions (InstructGPT), but is also capable of satisfying unseen compositional instructions. A study with 15 qualified crowdworkers shows that users successfully write poems with CoPoet on diverse topics ranging from Monarchy to Climate change. Further, the collaboratively written poems are preferred by third-party evaluators over those written without the system.
Paper Structure (37 sections, 11 figures, 8 tables, 1 algorithm)

This paper contains 37 sections, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: A collaborative poem entitled 'Decadence', written with CoPoet assistance. Green text was written directly by the human, who interacts with CoPoet using instructions. CoPoet offers multiple suggestions which the user can accept or reject. The user wrote a four line poem before indicating completion of the task.
  • Figure 2: Automatic evaluation of models on KIKA, KIUA and Compositional test sets. The $y$ axis is the percentage of instructions that each model successfully satisfies as determined by the criteria in \ref{['tab:success_conditions']}. We report results on T5-11B-poem, T5-3B-poem and T0-3B-poem along with the baselines---zero-shot T0pp sanh2021multitask and zero-shot (ZS)/few-shot (FS) InstructGPT (da-vinci) ouyang2022training. Each bar shows the average success rate of 5 model inferences along with the standard deviation. On average, T5-11B-poem achieves the highest success rate and InstructGPT is a strong few-shot baseline that obtains comparable results on KIUA.
  • Figure 3: CoPoet user study. We study if users can effectively collaborate with CoPoet to write poems (RQ1) and whether writing with CoPoet produces better poems compared to solo-writers (RQ2).
  • Figure 4: Proportions of the types of instructions used by experts in the poetry writing task.
  • Figure 5: Content overlap between sentences of an individual poem and the corresponding model suggestions calculated using Rouge-L recall. Y axis shows the percentage of poems out of 50 while X axis shows the amount of Copoet contribution in terms of Rouge-L.
  • ...and 6 more figures