Table of Contents
Fetching ...

Gatsby Without the 'E': Crafting Lipograms with LLMs

Rohan Balasubramanian, Nitish Gokulakrishnan, Syeda Jannatus Saba, Steven Skiena

TL;DR

The paper investigates generating fully 'e'-free lipograms of The Great Gatsby using modern LLMs, evaluating a spectrum of constraint-aware generation techniques. It couples baselines (e.g., E-Removal, Synonym Replacement) with advanced methods like constrained beam search, multi-candidate selection, and paraphrase-finetuned models to maintain semantic fidelity under strict alphabetic bans. Across extensive evaluations, all methods achieved 100% constraint adherence, while semantic fidelity degrades roughly linearly with constraint strength, with a notable exponential drop beyond modest exclusion levels; up to 3.6% letter exclusion preserves meaning reasonably well. The findings highlight both the feasibility and challenges of long-form constrained generation, offering insights into model choice (e.g., Llama3) and the impact of fine-tuning, decoding strategies, and post-processing on readability and grammar, and providing a benchmark for constrained text generation in NLP.

Abstract

Lipograms are a unique form of constrained writing where all occurrences of a particular letter are excluded from the text, typified by the novel Gadsby, which daringly avoids all usage of the letter 'e'. In this study, we explore the power of modern large language models (LLMs) by transforming the novel F. Scott Fitzgerald's The Great Gatsby into a fully 'e'-less text. We experimented with a range of techniques, from baseline methods like synonym replacement to sophisticated generative models enhanced with beam search and named entity analysis. We show that excluding up to 3.6% of the most common letters (up to the letter 'u') had minimal impact on the text's meaning, although translation fidelity rapidly and predictably decays with stronger lipogram constraints. Our work highlights the surprising flexibility of English under strict constraints, revealing just how adaptable and creative language can be.

Gatsby Without the 'E': Crafting Lipograms with LLMs

TL;DR

The paper investigates generating fully 'e'-free lipograms of The Great Gatsby using modern LLMs, evaluating a spectrum of constraint-aware generation techniques. It couples baselines (e.g., E-Removal, Synonym Replacement) with advanced methods like constrained beam search, multi-candidate selection, and paraphrase-finetuned models to maintain semantic fidelity under strict alphabetic bans. Across extensive evaluations, all methods achieved 100% constraint adherence, while semantic fidelity degrades roughly linearly with constraint strength, with a notable exponential drop beyond modest exclusion levels; up to 3.6% letter exclusion preserves meaning reasonably well. The findings highlight both the feasibility and challenges of long-form constrained generation, offering insights into model choice (e.g., Llama3) and the impact of fine-tuning, decoding strategies, and post-processing on readability and grammar, and providing a benchmark for constrained text generation in NLP.

Abstract

Lipograms are a unique form of constrained writing where all occurrences of a particular letter are excluded from the text, typified by the novel Gadsby, which daringly avoids all usage of the letter 'e'. In this study, we explore the power of modern large language models (LLMs) by transforming the novel F. Scott Fitzgerald's The Great Gatsby into a fully 'e'-less text. We experimented with a range of techniques, from baseline methods like synonym replacement to sophisticated generative models enhanced with beam search and named entity analysis. We show that excluding up to 3.6% of the most common letters (up to the letter 'u') had minimal impact on the text's meaning, although translation fidelity rapidly and predictably decays with stronger lipogram constraints. Our work highlights the surprising flexibility of English under strict constraints, revealing just how adaptable and creative language can be.

Paper Structure

This paper contains 19 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Cosine similarity distributions for lipogram translations. Better models have right-shift distributions.
  • Figure 2: Relationship between Mean Cosine Similarity and Percentage of Characters Avoided (Log Scale) in Generated Texts.
  • Figure 3: Grammar mistake distribution for lipogram models. Better models have left-shifted distributions.
  • Figure 4: Relationship between mean cosine similarity and percentage of characters avoided (linear scale) in generated texts under different constraints, illustrating a consistent linear relationship between quality and strength of constraint.