Selecting Belief-State Approximations in Simulators with Latent States
Nan Jiang
TL;DR
The paper tackles the challenge of resetting simulators with latent states by framing belief-state selection as a conditional-distribution problem and proposing two complementary formulations: latent-state-based and observation-based selections. It develops a reduction to conditional 2-sample tests and a discriminative, $Y$-only approach with provable TV-distance guarantees under a realizability-like assumption, plus an alternative observable-model viewpoint that yields rollout guarantees under either Single-Reset or Repeated-Reset regimes. A key insight is that the choice of rollout method critically affects the guarantees, with observation-based selection offering robustness to latent-variable redundancy but potentially facing horizon-dependent losses under certain rollouts. The paper further analyzes distribution shift, data-collection design, and extensions to broader sampling settings, and validates the ideas through a case study on real-system traces. Overall, it provides a principled framework for selecting and leveraging belief-state approximations to enable reliable planning and calibration in latent-state simulators.
Abstract
State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.
