Table of Contents
Fetching ...

Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning

Jinlong Liu, Mohammed Bahja, Venelin Kovatchev, Mark Lee

TL;DR

The paper tackles controlling classic-author voice in long-form story generation, proposing a GRPO-based framework guided by an AV-derived style reward and supplemented by content and completeness signals.A robust reward design and a controlled data pipeline enable effective style alignment while mitigating narrative drift, demonstrated through a Twain/Huckleberry Finn case study with an 8B model achieving strong AV-style metrics.Key contributions include a novel reward-model construction, a high-quality style-controlled dataset via masking/refill, and an empirical analysis of model choices for agentic training, highlighting the viability of moderate-size models for stylistic control.The work suggests promising directions for scalable, evaluator-backed stylistic generation, albeit with acknowledged challenges in long-range coherence and broader generalization across authors.

Abstract

Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on shallow cues (e.g., names or topics) to simulate authorial style, without robust evaluation. In this work, we present a training framework for style-conditioned story generation using Group Relative Policy Optimization (GRPO) and a custom multi-reward setup. The style reward is derived from a fine-tuned sentence transformer using authorship verification (AV) signals, combined with content and completeness scores to stabilize long-form narrative generation. We conduct experiments using fiction by Mark Twain, a prominent 19th-century American author, with The Adventures of Huckleberry Finn serving as the reference style exemplar. Our 8B model outperforms larger baselines such as GPT-4o and Claude Sonnet 4 in AV-style metrics, achieving a style score of 0.628 and competitive content quality. Results demonstrate the feasibility of agentic stylistic generation with moderate model size and task-specific training. While the output is clearly style-aligned, narrative completeness remains a challenge, indicating future work is needed to better model global coherence and story resolution.

Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning

TL;DR

The paper tackles controlling classic-author voice in long-form story generation, proposing a GRPO-based framework guided by an AV-derived style reward and supplemented by content and completeness signals.A robust reward design and a controlled data pipeline enable effective style alignment while mitigating narrative drift, demonstrated through a Twain/Huckleberry Finn case study with an 8B model achieving strong AV-style metrics.Key contributions include a novel reward-model construction, a high-quality style-controlled dataset via masking/refill, and an empirical analysis of model choices for agentic training, highlighting the viability of moderate-size models for stylistic control.The work suggests promising directions for scalable, evaluator-backed stylistic generation, albeit with acknowledged challenges in long-range coherence and broader generalization across authors.

Abstract

Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on shallow cues (e.g., names or topics) to simulate authorial style, without robust evaluation. In this work, we present a training framework for style-conditioned story generation using Group Relative Policy Optimization (GRPO) and a custom multi-reward setup. The style reward is derived from a fine-tuned sentence transformer using authorship verification (AV) signals, combined with content and completeness scores to stabilize long-form narrative generation. We conduct experiments using fiction by Mark Twain, a prominent 19th-century American author, with The Adventures of Huckleberry Finn serving as the reference style exemplar. Our 8B model outperforms larger baselines such as GPT-4o and Claude Sonnet 4 in AV-style metrics, achieving a style score of 0.628 and competitive content quality. Results demonstrate the feasibility of agentic stylistic generation with moderate model size and task-specific training. While the output is clearly style-aligned, narrative completeness remains a challenge, indicating future work is needed to better model global coherence and story resolution.

Paper Structure

This paper contains 12 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Subjects Distributions
  • Figure 2: Similarity‑score distributions for $1\,500$‑word chunks. Grey = cross‑author, blue = same‑author. Smaller outlier markers avoid visual clutter while preserving distribution detail.
  • Figure 3: Reward and KL values during training. Plot (a) shows that lower $\beta$ settings lead to higher reward scores, but this comes at the cost of training stability. Plot (b) illustrates that lower $\beta$ values result in larger KL divergence spikes, indicating unstable updates. The target behaviour is a smooth and consistent KL increase from 0 to 1, which reflects controlled policy adaptation.
  • Figure 4: Similarity score comparison for MPNet and MiniLM-L6 embedding models showing pre-trained vs fine-tuned performance.
  • Figure 5: Similarity score comparison for MiniLM-L12 and StyleDistance embedding models showing pre-trained vs fine-tuned performance.
  • ...and 2 more figures