Table of Contents
Fetching ...

Solving Stochastic Orienteering Problems with Chance Constraints Using Monte Carlo Tree Search

Stefano Carpin

TL;DR

The paper addresses planning under uncertainty for stochastic orienteering with a probabilistic budget bound. It introduces an online, anytime Monte Carlo Tree Search method (MCTS-SOPCC) that explicitly tracks both path value and the probability of violating the budget constraint, using a UCT-inspired policy with failures (UCTF) and rollout/backups informed by sample-based approximations. The approach avoids discretizing time and handles continuous residual budgets, delivering adaptive policies that perform near the optimal MILP solutions while offering substantial computational savings, particularly on larger graphs. This work advances risk-aware robotic routing by enabling online policy construction that respects chance constraints and scales to realistic problem instances.

Abstract

We present a new Monte Carlo Tree Search (MCTS) algorithm to solve the stochastic orienteering problem with chance constraints, i.e., a version of the problem where travel costs are random, and one is assigned a bound on the tolerable probability of exceeding the budget. The algorithm we present is online and anytime, i.e., it alternates planning and execution, and the quality of the solution it produces increases as the allowed computational time increases. Differently from most former MCTS algorithms, for each action available in a state the algorithm maintains estimates of both its value and the probability that its execution will eventually result in a violation of the chance constraint. Then, at action selection time, our proposed solution prunes away trajectories that are estimated to violate the failure probability. Extensive simulation results show that this approach can quickly produce high-quality solutions and is competitive with the optimal but time-consuming solution.

Solving Stochastic Orienteering Problems with Chance Constraints Using Monte Carlo Tree Search

TL;DR

The paper addresses planning under uncertainty for stochastic orienteering with a probabilistic budget bound. It introduces an online, anytime Monte Carlo Tree Search method (MCTS-SOPCC) that explicitly tracks both path value and the probability of violating the budget constraint, using a UCT-inspired policy with failures (UCTF) and rollout/backups informed by sample-based approximations. The approach avoids discretizing time and handles continuous residual budgets, delivering adaptive policies that perform near the optimal MILP solutions while offering substantial computational savings, particularly on larger graphs. This work advances risk-aware robotic routing by enabling online policy construction that respects chance constraints and scales to realistic problem instances.

Abstract

We present a new Monte Carlo Tree Search (MCTS) algorithm to solve the stochastic orienteering problem with chance constraints, i.e., a version of the problem where travel costs are random, and one is assigned a bound on the tolerable probability of exceeding the budget. The algorithm we present is online and anytime, i.e., it alternates planning and execution, and the quality of the solution it produces increases as the allowed computational time increases. Differently from most former MCTS algorithms, for each action available in a state the algorithm maintains estimates of both its value and the probability that its execution will eventually result in a violation of the chance constraint. Then, at action selection time, our proposed solution prunes away trajectories that are estimated to violate the failure probability. Extensive simulation results show that this approach can quickly produce high-quality solutions and is competitive with the optimal but time-consuming solution.
Paper Structure (15 sections, 3 theorems, 16 equations, 15 figures, 1 table, 4 algorithms)

This paper contains 15 sections, 3 theorems, 16 equations, 15 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

For $\lim_{S\rightarrow \infty}$, if the MCTS-SOPCC algorithm returns a feasible node $v_j$, then with probability $1-F[v_j]$ there is a solution to the SOPCC with expected value $Q[v_j]$.

Figures (15)

  • Figure 1: An example of the application of orienteering in viticulture. The left panel shows an aerial view of a commercial vineyard located in California with more than 55,000 vines. Pins are displayed at locations where soil moisture samples must be collected. The left panel shows a zoomed version of the vineyard, illustrating how the pins are placed in the traversable regions between tree rows. Due to the extension of the vineyard, a robot tasked with automatically collecting samples cannot visit all locations, and has to select a suitable subset. Not all sampling locations have the same value, and therefore this is a natural instance of the orienteering problem. Note that in this specific problem setting the robot cannot move along straight lines when moving between different locations, but rather has to exit either end of the vineyard before moving to a different tree row because of the irrigation infrastructure.
  • Figure 2: The right side of the figure shows a possible MCTS tree $\mathcal{T}$ associated with the simple graph on the left and rooted in $v_s$ (start vertex). Vertices $v_1$ and $v_2$ are children of $v_s$ because they are directly connected to it. Executing action $v_1$ from $v_s$ means moving from $v_s$ to $v_1$. Vertex $v_3$, not appearing in the tree, cannot be a child of $v_s$ because it is not directly connected to it. Vertex $v_2$ appears as a child of both $v_s$ and $v_1$ because it is connected to both, but it occurs along two different paths starting from the root node $v_s$. All paths in $\mathcal{T}$ from $v_s$ to a leaf encode possible paths in $G$. In this simple example there are two paths, namely $v_s,v_2$, and $v_s,v_1,v_2,v_g$. Note that while the MCTS is being built not all paths must end at the goal vertex $v_g$.
  • Figure 3: Assuming the tree is rooted in $v_s$, the tree policy UCTF repeatedly selects internal vertices in the tree $\mathcal{T}$ until a vertex leaf $v_i$ is reached, and then a child node $v_j$ is added to the tree. Then the rollout algorithm is run from $v_j$ to estimate the values $Q[v_j]$ and $F[v_j]$ to be stored in $v_i$ and then propagated back to the root.
  • Figure 4: When backing up the values $F[v_j]$ and $Q[v_j]$ stored at node $v_i$, it is necessary to consider their relationships with the values $F[v_i]$ and $Q[v_i]$ stored with $v_k$.
  • Figure 5: The four test graphs used for our initial benchmarking. In all graphs the start vertex is marked in red and the end vertex is marked in black.
  • ...and 10 more figures

Theorems & Definitions (7)

  • Definition 1
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof