Table of Contents
Fetching ...

Selecting Belief-State Approximations in Simulators with Latent States

Nan Jiang

TL;DR

The paper tackles the challenge of resetting simulators with latent states by framing belief-state selection as a conditional-distribution problem and proposing two complementary formulations: latent-state-based and observation-based selections. It develops a reduction to conditional 2-sample tests and a discriminative, $Y$-only approach with provable TV-distance guarantees under a realizability-like assumption, plus an alternative observable-model viewpoint that yields rollout guarantees under either Single-Reset or Repeated-Reset regimes. A key insight is that the choice of rollout method critically affects the guarantees, with observation-based selection offering robustness to latent-variable redundancy but potentially facing horizon-dependent losses under certain rollouts. The paper further analyzes distribution shift, data-collection design, and extensions to broader sampling settings, and validates the ideas through a case study on real-system traces. Overall, it provides a principled framework for selecting and leveraging belief-state approximations to enable reliable planning and calibration in latent-state simulators.

Abstract

State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.

Selecting Belief-State Approximations in Simulators with Latent States

TL;DR

The paper tackles the challenge of resetting simulators with latent states by framing belief-state selection as a conditional-distribution problem and proposing two complementary formulations: latent-state-based and observation-based selections. It develops a reduction to conditional 2-sample tests and a discriminative, -only approach with provable TV-distance guarantees under a realizability-like assumption, plus an alternative observable-model viewpoint that yields rollout guarantees under either Single-Reset or Repeated-Reset regimes. A key insight is that the choice of rollout method critically affects the guarantees, with observation-based selection offering robustness to latent-variable redundancy but potentially facing horizon-dependent losses under certain rollouts. The paper further analyzes distribution shift, data-collection design, and extensions to broader sampling settings, and validates the ideas through a case study on real-system traces. Overall, it provides a principled framework for selecting and leveraging belief-state approximations to enable reliable planning and calibration in latent-state simulators.

Abstract

State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.

Paper Structure

This paper contains 34 sections, 9 theorems, 50 equations, 1 figure, 1 table.

Key Result

Theorem 1

Under Assumption asm:nontrivial_F, for $\hat{i}$ identified by Eq.eq:scheffe, with probability at least $1-\delta$, as long as Invoked on $X = \tau_t$, $Y= s_t$, and $P^\star$ is distribution under behavior policy $\pi_b$, we have where $\tau_t \sim \Gamma^{\pi_b}$ is a partial trajectory naturally simulated in $\Gamma$ under policy $\pi_b$ without using resetting.

Figures (1)

  • Figure 1: Toy example for illustrating the difference between Single-Reset and Repeated-Reset. In this binary-observation, action-less system, "X" represents the occurrence of an event. Every time an event happens ("X"), the system samples the interval till next event from some distribution. The history of interest is "X O O", and the first row shows the real trajectory. $\textbf{b}$ always sets the latent state to be $0$, i.e., predicts that next event will occur immediately.

Theorems & Definitions (16)

  • Theorem 1: Sample complexity
  • Corollary 2
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Theorem 5
  • Corollary 6
  • Example 1: $Q_{\textsc{1-Reset}(\Gamma, \textbf{b})}^{\pi_b}$ cannot enjoy the guarantee of Corollary \ref{['cor:raro']}
  • Theorem 7
  • ...and 6 more