DHoTT: A Temporal Extension of Homotopy Type Theory for Semantic Drift
Iman Poernomo
TL;DR
The paper addresses semantic drift in evolving conversational texts and proposes Dynamic HoTT (DHoTT), a fibrewise Kan-semantic framework augmented with a cross-time calculus to reason about meaning across turns. It combines a measurement pipeline (embeddings on $S^{d-1}$, Čech nerves, and Kan fibrant replacement) with a small transferrable calculus (Carry and Rupture) and an append-only ledger to track provenance across time, all interpreted in a simplicial-presheaf model $[\mathbb{T}^{\mathrm{op}},\mathbf{sSet}]$. Key contributions include fibrewise semantics that preserve HoTT at each time slice, a conservative cross-time extension that records semantic continuity and breakage, and a worked toy example showing how carries, ruptures, and ledgers encode drift and healing in LLM-driven dialogues. The framework provides a principled, geometrically grounded kernel for auditing semantic drift in embedding-based systems, with direct implications for LLM analysis, prompt engineering, and retrieval-augmented generation. Overall, DHoTT bridges dynamic linguistics, distributional semantics, and homotopy-theoretic semantics to turn embedding traces into proofs and provenance-aware diagnostics of meaning evolution.
Abstract
Dynamic HoTT (DHoTT) is a conservative extension of Homotopy Type Theory designed for evolving texts in conversational AI. In a chat system, a large language model (LLM) is queried with a growing prefix: at turn tau the input is C(tau), the concatenation of all previous prompts and replies, and the new answer extends C(tau+1). We study the logical semantics of these time indexed texts and how their meanings drift or break over time, linking Homotopy Type Theory with distributional semantics and topological data analysis on embedding spaces. For each turn we embed all tokens seen so far using a frozen encoder and map them to the unit sphere, build a good cover by spherical caps, and form the Cech nerve. A Kan fibrant replacement yields a Kan complex ET(tau), the Evolving Text at time tau, where identity types are path spaces and dependent types support ordinary HoTT transport. Time is treated functorially so that all HoTT rules interpret fibrewise and standard substitution and conservativity properties hold. On top of this fibrewise HoTT layer we add a small cross time calculus for semantic change. A carry records when a later use has a certified path back into an earlier fibre under an admissibility policy. A rupture is a positive sigma type storing finite, policy checked failed attempts together with an open horn tag. An append only ledger accumulates carries and ruptures so that earlier failures are not erased when later alignments appear. We illustrate the construction on small, replicable examples from real human LLM dialogues using a concrete DeBERTa based embedding pipeline. This provides a principled, geometrically grounded core for analysing and auditing semantic drift in large language models and other embedding based systems.
