Table of Contents
Fetching ...

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

Morris Alper, Moran Yanuka, Raja Giryes, Gašper Beguš

TL;DR

This work introduces ConlangCrafter, an LLM-driven, multi-hop pipeline for end-to-end constructed language creation that decomposes design into phonology, grammar, and lexicon within a memory-based language sketch. By integrating randomness injection and a self-refinement loop, the method encourages typological diversity while maintaining internal consistency, and it supports constructive translation to expand the language sketch as needed. The authors evaluate the approach with a scalable automatic judge framework and manual expert validation, showing increased typological diversity and improved consistency relative to a baseline, though fully consistent translations remain challenging. They also discuss practical applications in world-building, linguistic experimentation, and potential extensions to low-resource contexts, along with ethical considerations and limitations. The work offers a scalable computational creativity tool for conlanging with measurable diversity and alignment to linguistic typology frameworks.

Abstract

Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages - phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring consistency and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

TL;DR

This work introduces ConlangCrafter, an LLM-driven, multi-hop pipeline for end-to-end constructed language creation that decomposes design into phonology, grammar, and lexicon within a memory-based language sketch. By integrating randomness injection and a self-refinement loop, the method encourages typological diversity while maintaining internal consistency, and it supports constructive translation to expand the language sketch as needed. The authors evaluate the approach with a scalable automatic judge framework and manual expert validation, showing increased typological diversity and improved consistency relative to a baseline, though fully consistent translations remain challenging. They also discuss practical applications in world-building, linguistic experimentation, and potential extensions to low-resource contexts, along with ethical considerations and limitations. The work offers a scalable computational creativity tool for conlanging with measurable diversity and alignment to linguistic typology frameworks.

Abstract

Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages - phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring consistency and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.

Paper Structure

This paper contains 19 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: ConlangCrafter outputs, showing sample sentences and glosses in diverse conlangs generated with our agentic LLM pipeline. Each language is designed to be internally consistent, while being typologically unique in its phonology and morpho-syntax.
  • Figure 2: Our Method. ConlangCrafter constructs languages with an multi-hop, stateful pipeline, by decomposing them into linguistic layers (phonology, grammar, lexicon) and generating each in turn with an LLM. We leverage the LLM's meta-linguistic understanding of language typology while injecting randomness from a random number generator (RNG) to encourage linguistic diversity, and we use self-refinement to enhance internal consistency. ConlangCrafter conditions on this language sketch to translate and gloss new sentences; new lexical items and grammar points that were previously underspecified can be dynamically added back to the language sketch. Prompts above are abridged and some intermediate steps are omitted.
  • Figure 3: t-SNE visualization of typological diversity. Each point represents a generated language, where spatial proximity reflects typological similarity based on pairwise Hamming distances over WALS features. The dispersed distribution of ConlangCrafter-generated languages highlights their high typological diversity relative to the baseline method.