ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline
Morris Alper, Moran Yanuka, Raja Giryes, Gašper Beguš
TL;DR
This work introduces ConlangCrafter, an LLM-driven, multi-hop pipeline for end-to-end constructed language creation that decomposes design into phonology, grammar, and lexicon within a memory-based language sketch. By integrating randomness injection and a self-refinement loop, the method encourages typological diversity while maintaining internal consistency, and it supports constructive translation to expand the language sketch as needed. The authors evaluate the approach with a scalable automatic judge framework and manual expert validation, showing increased typological diversity and improved consistency relative to a baseline, though fully consistent translations remain challenging. They also discuss practical applications in world-building, linguistic experimentation, and potential extensions to low-resource contexts, along with ethical considerations and limitations. The work offers a scalable computational creativity tool for conlanging with measurable diversity and alignment to linguistic typology frameworks.
Abstract
Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages - phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' metalinguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring consistency and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.
