Table of Contents
Fetching ...

Monte Carlo Tree Search with Spectral Expansion for Planning with Dynamical Systems

Benjamin Riviere, John Lathrop, Soon-Jo Chung

TL;DR

This work addresses planning with continuous, high-dimensional dynamical systems by replacing intractable direct discretization with Spectral Expansion Tree Search (SETS), which uses the spectrum of the locally linearized controllability Gramian to construct a compact, energetically bounded discrete representation. By coupling spectral expansion with Monte Carlo Tree Search, SETS achieves real-time, globally convergent planning for deterministic, differentiable MDPs across underactuated and nonlinear dynamics. The authors prove two theorems: (i) a bounded-error discrete representation of continuous MDPs via spectral expansion, and (ii) finite-time convergence of MCTS value estimates under their expansion and exploration scheme. Empirical validation spans quadrotor wind field navigation, driver-assisted tracking under actuator degradation, coordinated space Debris net capture, and a glider energy-harvesting task, demonstrating automatic discovery of diverse, near-optimal behaviors and real-time applicability across robotics domains.

Abstract

The ability of a robot to plan complex behaviors with real-time computation, rather than adhering to predesigned or offline-learned routines, alleviates the need for specialized algorithms or training for each problem instance. Monte Carlo Tree Search is a powerful planning algorithm that strategically explores simulated future possibilities, but it requires a discrete problem representation that is irreconcilable with the continuous dynamics of the physical world. We present Spectral Expansion Tree Search (SETS), a real-time, tree-based planner that uses the spectrum of the locally linearized system to construct a low-complexity and approximately equivalent discrete representation of the continuous world. We prove SETS converges to a bound of the globally optimal solution for continuous, deterministic and differentiable Markov Decision Processes, a broad class of problems that includes underactuated nonlinear dynamics, non-convex reward functions, and unstructured environments. We experimentally validate SETS on drone, spacecraft, and ground vehicle robots and one numerical experiment, each of which is not directly solvable with existing methods. We successfully show SETS automatically discovers a diverse set of optimal behaviors and motion trajectories in real time.

Monte Carlo Tree Search with Spectral Expansion for Planning with Dynamical Systems

TL;DR

This work addresses planning with continuous, high-dimensional dynamical systems by replacing intractable direct discretization with Spectral Expansion Tree Search (SETS), which uses the spectrum of the locally linearized controllability Gramian to construct a compact, energetically bounded discrete representation. By coupling spectral expansion with Monte Carlo Tree Search, SETS achieves real-time, globally convergent planning for deterministic, differentiable MDPs across underactuated and nonlinear dynamics. The authors prove two theorems: (i) a bounded-error discrete representation of continuous MDPs via spectral expansion, and (ii) finite-time convergence of MCTS value estimates under their expansion and exploration scheme. Empirical validation spans quadrotor wind field navigation, driver-assisted tracking under actuator degradation, coordinated space Debris net capture, and a glider energy-harvesting task, demonstrating automatic discovery of diverse, near-optimal behaviors and real-time applicability across robotics domains.

Abstract

The ability of a robot to plan complex behaviors with real-time computation, rather than adhering to predesigned or offline-learned routines, alleviates the need for specialized algorithms or training for each problem instance. Monte Carlo Tree Search is a powerful planning algorithm that strategically explores simulated future possibilities, but it requires a discrete problem representation that is irreconcilable with the continuous dynamics of the physical world. We present Spectral Expansion Tree Search (SETS), a real-time, tree-based planner that uses the spectrum of the locally linearized system to construct a low-complexity and approximately equivalent discrete representation of the continuous world. We prove SETS converges to a bound of the globally optimal solution for continuous, deterministic and differentiable Markov Decision Processes, a broad class of problems that includes underactuated nonlinear dynamics, non-convex reward functions, and unstructured environments. We experimentally validate SETS on drone, spacecraft, and ground vehicle robots and one numerical experiment, each of which is not directly solvable with existing methods. We successfully show SETS automatically discovers a diverse set of optimal behaviors and motion trajectories in real time.

Paper Structure

This paper contains 4 sections, 19 theorems, 144 equations, 10 figures, 1 algorithm.

Key Result

Theorem 1

Consider an MDP $\left<X, U, F, R, D, K, \gamma \right>$. For initial state $\mathbf{x}_0$, Spectral Expansion with horizon $H$ creates a discrete representation with a bounded equivalent optimal value function: where $\kappa_{0,1,2,3}$ are problem-specific constants, and $\Delta t$ is the discretization of continuous-time dynamics.

Figures (10)

  • Figure 1: (A) Our method, SETS, is a new tree-based planning algorithm for dynamical systems. The tree's edges (shown in gray) are constructed by tracking the spectral modes of the local linearization (shown in blue) with nonlinear feedback control. (B/C/D/E/F) We demonstrate SETS is widely applicable in robotic domains, spanning ground, aerial, and space domains.
  • Figure 1: Forces on Spacecraft Capture Problem for two net nodes.
  • Figure 2: (A) SETS enables a drone, circled in blue, to plan trajectories to multiple targets (white) over a fan array and obstacles (orange) in real time. The twelve dimensional search tree is projected onto the two dimensional fan surface. The branches are colored by the order of expansion, with yellow indicating later trajectories. (B) The spectrum of the controllability Gramian is shown for flying in still air and flying through a thermal. Each column corresponds to a natural motion of the system, each row corresponds to a dimension of the state, and each cell is colored by the magnitude of its controllability. (C) A top-down view of the final trajectory, where the targets are shown in green, the thermals in orange, and the obstacles in gray. The thermals are shaded by their relative strength. (D) The distance to each target over time. The mission objective is to visit all four targets and is completed after 37 seconds.
  • Figure 2: Our proof layout shows how Lemmas are connected to yield our main theoretical contributions.
  • Figure 3: Caption next page.
  • ...and 5 more figures

Theorems & Definitions (39)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 29 more