A Recursive Encoding for Cuneiform Signs
Daniel M. Stelzer
TL;DR
This work tackles the tedious task of looking up cuneiform signs by introducing Kadaru, a recursive, tree-based encoding that represents signs as leaves of five basic strokes connected by three composition types. It provides rendering, normalization, and an encompassing search algorithm that allows users to locate signs from partial or damaged components, and it offers a prototype interface to demonstrate the approach. Key contributions include the Kadaru encoding, accompanying rendering and normalization algorithms, and the encompassing search framework, plus a public prototype and codebase. The approach promises improved lookup efficiency, damage resilience, and a pathway toward machine-learning integration via a compact, expressive intermediate representation that generalizes across eras and dialects.
Abstract
One of the most significant problems in cuneiform pedagogy is the process of looking up unknown signs, which often involves a tedious page-by-page search through a sign list. This paper proposes a new "recursive encoding" for signs, which represents the arrangement of strokes in a way a computer can process. A series of new algorithms then offers students a new way to look up signs by any distinctive component, as well as providing new ways to render signs and tablets electronically.
