Table of Contents
Fetching ...

The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability

Anthony T. Nixon

Abstract

Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDPs. A fixed probe family of finite-state controllers induces a closed-loop Wasserstein pseudometric on observation histories and a probe-exact quotient merging histories that no controller in the family can distinguish. The quotient is canonical, minimal, and unique-a bounded-interaction analogue of the Myhill-Nerode theorem. For clock-aware probes, it is exactly decision-sufficient for objectives that depend only on the agent's observations and actions; for latent-state rewards, we use an observation-Lipschitz approximation bound. The main theorem object is the clock-aware quotient; scalable deterministic-stationary experiments study a tractable coarsening with gap measured on small exact cases and explored empirically at larger scale. We validate theorem-level claims on Tiger and GridWorld. We also report operational case studies on Tiger, GridWorld, and RockSample as exploratory diagnostics of approximation behavior and runtime, not as theorem-facing evidence when no exact cross-family certificate is available; heavier stress tests are archived in the appendix and artifact package.

The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability

Abstract

Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDPs. A fixed probe family of finite-state controllers induces a closed-loop Wasserstein pseudometric on observation histories and a probe-exact quotient merging histories that no controller in the family can distinguish. The quotient is canonical, minimal, and unique-a bounded-interaction analogue of the Myhill-Nerode theorem. For clock-aware probes, it is exactly decision-sufficient for objectives that depend only on the agent's observations and actions; for latent-state rewards, we use an observation-Lipschitz approximation bound. The main theorem object is the clock-aware quotient; scalable deterministic-stationary experiments study a tractable coarsening with gap measured on small exact cases and explored empirically at larger scale. We validate theorem-level claims on Tiger and GridWorld. We also report operational case studies on Tiger, GridWorld, and RockSample as exploratory diagnostics of approximation behavior and runtime, not as theorem-facing evidence when no exact cross-family certificate is available; heavier stress tests are archived in the appendix and artifact package.
Paper Structure (106 sections, 23 theorems, 54 equations, 5 figures, 31 tables, 2 algorithms)

This paper contains 106 sections, 23 theorems, 54 equations, 5 figures, 31 tables, 2 algorithms.

Key Result

Proposition 3.2

$D_T^\Pi$ is a pseudometric on POMDPs sharing $(A,O)$, with $D_T^\Pi(M,N) = 0$ iff $P_M^\pi = P_N^\pi$ for all $\pi \in \Pi$.

Figures (5)

  • Figure 1: (a) Any system with hidden state, an observation channel, and a bounded observer $(m, T, \delta_O)$ admits a canonical quotient---a minimal equivalence over situations the observer cannot distinguish. (b) The POMDP instantiation: observation histories are probed by all bounded FSCs; pairwise $\mathcal{W}_1$ distances (maximised over policies) are clustered; the quotient POMDP preserves all observation laws with certified value loss.
  • Figure 2: Refinement lattice for Tiger ($T{=}4$). Solid arrows: increasing $\varepsilon$ coarsens the quotient (Definition \ref{['def:eps-equiv']}). Dashed arrows: increasing $m$ refines the quotient (Proposition \ref{['prop:probe-class-coarsening']}---larger probe class $\Rightarrow$ finer partition). Data from operational capacity sweep.
  • Figure 3: Tiger POMDP quotient structures for $m=1$, $T=2$. Left: operational exact-for-family quotient (7 classes). Right: $\varepsilon=0.5$ quotient merges $L$ and $R$ (3 classes).
  • Figure 4: Operational runtime scaling with horizon (log scale): monolithic, layered, and empirical calibrated-subset tracks.
  • Figure 5: Layered bound validation (Theorem \ref{['thm:layered-composition']}). Each bar compares the empirical LHS distortion (blue) against the theoretical RHS bound (orange) for eight test cases spanning single-layer reduction, wrapper contraction (deterministic and stochastic), and long-horizon theorem bounds ($T \in \{4, 8, 10\}$). All checks satisfy LHS $\leq$ RHS.

Theorems & Definitions (62)

  • Definition 2.1: Finite POMDP
  • Definition 2.2: Stochastic Finite-State Controller
  • Definition 3.1: Closed-loop Wasserstein pseudometric
  • Proposition 3.2
  • proof
  • Remark 3.3: Relation to bisimulation metrics
  • Definition 3.4: $L_R$-observation-Lipschitz reward
  • Remark 3.5: When is $L_R$ small?
  • Theorem 3.6: Value-function error bound
  • proof
  • ...and 52 more