Table of Contents
Fetching ...

DHoTT: A Temporal Extension of Homotopy Type Theory for Semantic Drift

Iman Poernomo

TL;DR

The paper addresses semantic drift in evolving conversational texts and proposes Dynamic HoTT (DHoTT), a fibrewise Kan-semantic framework augmented with a cross-time calculus to reason about meaning across turns. It combines a measurement pipeline (embeddings on $S^{d-1}$, Čech nerves, and Kan fibrant replacement) with a small transferrable calculus (Carry and Rupture) and an append-only ledger to track provenance across time, all interpreted in a simplicial-presheaf model $[\mathbb{T}^{\mathrm{op}},\mathbf{sSet}]$. Key contributions include fibrewise semantics that preserve HoTT at each time slice, a conservative cross-time extension that records semantic continuity and breakage, and a worked toy example showing how carries, ruptures, and ledgers encode drift and healing in LLM-driven dialogues. The framework provides a principled, geometrically grounded kernel for auditing semantic drift in embedding-based systems, with direct implications for LLM analysis, prompt engineering, and retrieval-augmented generation. Overall, DHoTT bridges dynamic linguistics, distributional semantics, and homotopy-theoretic semantics to turn embedding traces into proofs and provenance-aware diagnostics of meaning evolution.

Abstract

Dynamic HoTT (DHoTT) is a conservative extension of Homotopy Type Theory designed for evolving texts in conversational AI. In a chat system, a large language model (LLM) is queried with a growing prefix: at turn tau the input is C(tau), the concatenation of all previous prompts and replies, and the new answer extends C(tau+1). We study the logical semantics of these time indexed texts and how their meanings drift or break over time, linking Homotopy Type Theory with distributional semantics and topological data analysis on embedding spaces. For each turn we embed all tokens seen so far using a frozen encoder and map them to the unit sphere, build a good cover by spherical caps, and form the Cech nerve. A Kan fibrant replacement yields a Kan complex ET(tau), the Evolving Text at time tau, where identity types are path spaces and dependent types support ordinary HoTT transport. Time is treated functorially so that all HoTT rules interpret fibrewise and standard substitution and conservativity properties hold. On top of this fibrewise HoTT layer we add a small cross time calculus for semantic change. A carry records when a later use has a certified path back into an earlier fibre under an admissibility policy. A rupture is a positive sigma type storing finite, policy checked failed attempts together with an open horn tag. An append only ledger accumulates carries and ruptures so that earlier failures are not erased when later alignments appear. We illustrate the construction on small, replicable examples from real human LLM dialogues using a concrete DeBERTa based embedding pipeline. This provides a principled, geometrically grounded core for analysing and auditing semantic drift in large language models and other embedding based systems.

DHoTT: A Temporal Extension of Homotopy Type Theory for Semantic Drift

TL;DR

The paper addresses semantic drift in evolving conversational texts and proposes Dynamic HoTT (DHoTT), a fibrewise Kan-semantic framework augmented with a cross-time calculus to reason about meaning across turns. It combines a measurement pipeline (embeddings on , Čech nerves, and Kan fibrant replacement) with a small transferrable calculus (Carry and Rupture) and an append-only ledger to track provenance across time, all interpreted in a simplicial-presheaf model . Key contributions include fibrewise semantics that preserve HoTT at each time slice, a conservative cross-time extension that records semantic continuity and breakage, and a worked toy example showing how carries, ruptures, and ledgers encode drift and healing in LLM-driven dialogues. The framework provides a principled, geometrically grounded kernel for auditing semantic drift in embedding-based systems, with direct implications for LLM analysis, prompt engineering, and retrieval-augmented generation. Overall, DHoTT bridges dynamic linguistics, distributional semantics, and homotopy-theoretic semantics to turn embedding traces into proofs and provenance-aware diagnostics of meaning evolution.

Abstract

Dynamic HoTT (DHoTT) is a conservative extension of Homotopy Type Theory designed for evolving texts in conversational AI. In a chat system, a large language model (LLM) is queried with a growing prefix: at turn tau the input is C(tau), the concatenation of all previous prompts and replies, and the new answer extends C(tau+1). We study the logical semantics of these time indexed texts and how their meanings drift or break over time, linking Homotopy Type Theory with distributional semantics and topological data analysis on embedding spaces. For each turn we embed all tokens seen so far using a frozen encoder and map them to the unit sphere, build a good cover by spherical caps, and form the Cech nerve. A Kan fibrant replacement yields a Kan complex ET(tau), the Evolving Text at time tau, where identity types are path spaces and dependent types support ordinary HoTT transport. Time is treated functorially so that all HoTT rules interpret fibrewise and standard substitution and conservativity properties hold. On top of this fibrewise HoTT layer we add a small cross time calculus for semantic change. A carry records when a later use has a certified path back into an earlier fibre under an admissibility policy. A rupture is a positive sigma type storing finite, policy checked failed attempts together with an open horn tag. An append only ledger accumulates carries and ruptures so that earlier failures are not erased when later alignments appear. We illustrate the construction on small, replicable examples from real human LLM dialogues using a concrete DeBERTa based embedding pipeline. This provides a principled, geometrically grounded core for analysing and auditing semantic drift in large language models and other embedding based systems.

Paper Structure

This paper contains 78 sections, 11 theorems, 47 equations, 3 figures, 1 table.

Key Result

Lemma 2.1

Let $U=\{B_j\}_{j\in J}$ be a locally finite family of geodesic caps on the round sphere $S^{d-1}$ with angular metric, where Then every nonempty finite intersection $\bigcap_{r=0}^k B_{j_r}$ is geodesically convex and hence contractible. In particular, $U$ is a good cover of $\bigcup_j B_j$.

Figures (3)

  • Figure 1: Global view of the Evolving Text bases $ET(\tau_i)$ for the four time steps in Table \ref{['tab:bank-cat-flow']}. Points are the PCA-projected embeddings in $P_{\tau_i}$, solid edges are Čech adjacencies (overlapping caps), and dashed edges are short composites introduced by the fibrant replacement $Ex^\infty$. The three tracked tokens "bank", "cat", and "flow" appear as larger, coloured points.
  • Figure 2: Local Čech stars around the tracked tokens in each slice $ET(\tau_i)$. In each panel we restrict to the $1$-hop neighbourhood of the tracked occurrences of "bank", "cat", and "flow" and show the induced $1$-skeleton. The tracked tokens are drawn as large coloured points; smaller grey points are neighbouring tokens that share caps with them and therefore support their local distributional semantics (for example, at $\tau_2$ the neighbourhood of "bank" contains "river" and "flow", whereas by $\tau_4$ it is dominated by tokens such as "examples", "pipeline", and "proofs").
  • Figure 3: Pairwise overlays of the tracked trajectories $a_{\mathsf{bank}}, a_{\mathsf{cat}}, a_{\mathsf{flow}}$ in the global PCA coordinate system. Each panel shows the positions of the three tracked words at two times (circles for the earlier slice, triangles for the later slice), with dotted line segments joining corresponding tokens only when a $\mathsf{Adm}$-admissible path exists in the earlier ET$(\tau)$. Under the strict policy used in this run (edges restricted to the shortest 20% and paths to a single hop), "cat" and "bank" carries across all three transitions and "flow" ruptures only on $\tau_2\to\tau_3$ but heals on the next step.

Theorems & Definitions (26)

  • Remark : Carry and Rupture at a glance
  • Remark : Notation guide
  • Remark : Measurement precedes reasoning
  • Lemma 2.1: Geodesic caps form a good cover on $S^{d-1}$
  • proof
  • Lemma 3.1: Slices and fibres
  • Example 3.2: Concrete policy
  • Lemma 3.3: Finiteness and decidability of attempts
  • Remark : Measurement vs. reasoning
  • Remark : No adjacency; patience is internal
  • ...and 16 more