Table of Contents
Fetching ...

Stochastic Path Planning in Correlated Obstacle Fields

Li Zhou, Elvan Ceyhan

TL;DR

The paper tackles stochastic path planning in environments with spatially correlated obstacles whose statuses are uncertain and revealed only via noisy, costly sensing. It introduces SCOS, a Gaussian Random Field-based framework for belief updates, enabling search-space reduction and principled decision making. A two-stage policy learning approach combines offline optimistic policy iteration with an information-gain bonus and online rollout updates, augmented by distributional reinforcement learning to capture full cost distributions. Theoretical results establish correlation-aware dominance, submodularity of information gain, and convergence under posterior sampling, while extensive simulations show consistent gains over baselines, especially in challenging, high-noise scenarios. The work advances robust, uncertainty-aware navigation in adversarial or clustered-hazard environments and suggests directions for scalability and dynamic/multi-agent extensions.

Abstract

We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain blockage status, realistically constrained sensors that provide noisy readings and costly disambiguation. Modeling the spatial correlation with Gaussian Random Field (GRF), we develop Bayesian belief updates that refine blockage probabilities, and use the posteriors to reduce search space for efficiency. To find the optimal traversal policy, we propose a novel two-stage learning framework. An offline phase learns a robust base policy via optimistic policy iteration augmented with information bonus to encourage exploration in informative regions, followed by an online rollout policy with periodic base updates via a Bayesian mechanism for information adaptation. This framework supports both Monte Carlo point estimation and distributional reinforcement learning (RL) to learn full cost distributions, leading to stronger uncertainty quantification. We establish theoretical benefits of correlation-aware updating and convergence property under posterior sampling. Comprehensive empirical evaluations across varying obstacle densities, sensor capabilities demonstrate consistent performance gains over baselines. This framework addresses navigation challenges in environments with adversarial interruptions or clustered natural hazards.

Stochastic Path Planning in Correlated Obstacle Fields

TL;DR

The paper tackles stochastic path planning in environments with spatially correlated obstacles whose statuses are uncertain and revealed only via noisy, costly sensing. It introduces SCOS, a Gaussian Random Field-based framework for belief updates, enabling search-space reduction and principled decision making. A two-stage policy learning approach combines offline optimistic policy iteration with an information-gain bonus and online rollout updates, augmented by distributional reinforcement learning to capture full cost distributions. Theoretical results establish correlation-aware dominance, submodularity of information gain, and convergence under posterior sampling, while extensive simulations show consistent gains over baselines, especially in challenging, high-noise scenarios. The work advances robust, uncertainty-aware navigation in adversarial or clustered-hazard environments and suggests directions for scalability and dynamic/multi-agent extensions.

Abstract

We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain blockage status, realistically constrained sensors that provide noisy readings and costly disambiguation. Modeling the spatial correlation with Gaussian Random Field (GRF), we develop Bayesian belief updates that refine blockage probabilities, and use the posteriors to reduce search space for efficiency. To find the optimal traversal policy, we propose a novel two-stage learning framework. An offline phase learns a robust base policy via optimistic policy iteration augmented with information bonus to encourage exploration in informative regions, followed by an online rollout policy with periodic base updates via a Bayesian mechanism for information adaptation. This framework supports both Monte Carlo point estimation and distributional reinforcement learning (RL) to learn full cost distributions, leading to stronger uncertainty quantification. We establish theoretical benefits of correlation-aware updating and convergence property under posterior sampling. Comprehensive empirical evaluations across varying obstacle densities, sensor capabilities demonstrate consistent performance gains over baselines. This framework addresses navigation challenges in environments with adversarial interruptions or clustered natural hazards.

Paper Structure

This paper contains 58 sections, 9 theorems, 70 equations, 12 figures, 3 tables, 6 algorithms.

Key Result

Theorem 4.1

Under ass:finite--ass:gamma, let be the optimal expected total costs when decisions are based on correlation-aware vs. coarsened (independent) beliefs, respectively. Then $J^{\star}_{\mathrm{cor}}\le J^{\star}_{\mathrm{ind}}$.

Figures (12)

  • Figure 1: SCOS environment with obstacle statuses (left, red and gray disks representing true and false obstacles, respectively) versus with probability estimates in sensing range (right).
  • Figure 2: A schematic illustration of transitions in the decision-making process
  • Figure 3: Two-stage policy learning for SCOS
  • Figure 4: Example traversal environments containing $N=40$ and $N=20$ obstacles (red, gray disks present obstacles that are actual threats and false alarms, respectively)
  • Figure 5: Illustration of different policy approaches for a region with 40 potentially blocked disks under various correlation assumptions (disk background color shows the ground truth, red and gray disks present the actual threats and false alarms, respectively).
  • ...and 7 more figures

Theorems & Definitions (27)

  • Theorem 4.1: Correlation-Aware Dominance
  • proof
  • Remark 4.2
  • Corollary 4.3: Monotonicity with additional observations
  • proof
  • proof : Alternative Constructive Proof:
  • Remark 4.4: When a Blackwell view applies
  • Remark 4.5: Sequential variants
  • Definition 5.1: Pure exploitation baseline
  • Definition 5.2: Per-obstacle optimistic lower bound
  • ...and 17 more