Table of Contents
Fetching ...

A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control

Louis Claeys, Artur Goldman, Zebang Shen, Niao He

Abstract

High-dimensional stochastic optimal control (SOC) becomes harder with longer planning horizons: existing methods scale linearly in the horizon $T$, with performance often deteriorating exponentially. We overcome these limitations for a subclass of linearly-solvable SOC problems-those whose uncontrolled drift is the gradient of a potential. In this setting, the Hamilton-Jacobi-Bellman equation reduces to a linear PDE governed by an operator $\mathcal{L}$. We prove that, under the gradient drift assumption, $\mathcal{L}$ is unitarily equivalent to a Schrödinger operator $\mathcal{S} = -Δ+ \mathcal{V}$ with purely discrete spectrum, allowing the long-horizon control to be efficiently described via the eigensystem of $\mathcal{L}$. This connection provides two key results: first, for a symmetric linear-quadratic regulator (LQR), $\mathcal{S}$ matches the Hamiltonian of a quantum harmonic oscillator, whose closed-form eigensystem yields an analytic solution to the symmetric LQR with \emph{arbitrary} terminal cost. Second, in a more general setting, we learn the eigensystem of $\mathcal{L}$ using neural networks. We identify implicit reweighting issues with existing eigenfunction learning losses that degrade performance in control tasks, and propose a novel loss function to mitigate this. We evaluate our method on several long-horizon benchmarks, achieving an order-of-magnitude improvement in control accuracy compared to state-of-the-art methods, while reducing memory usage and runtime complexity from $\mathcal{O}(Td)$ to $\mathcal{O}(d)$.

A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control

Abstract

High-dimensional stochastic optimal control (SOC) becomes harder with longer planning horizons: existing methods scale linearly in the horizon , with performance often deteriorating exponentially. We overcome these limitations for a subclass of linearly-solvable SOC problems-those whose uncontrolled drift is the gradient of a potential. In this setting, the Hamilton-Jacobi-Bellman equation reduces to a linear PDE governed by an operator . We prove that, under the gradient drift assumption, is unitarily equivalent to a Schrödinger operator with purely discrete spectrum, allowing the long-horizon control to be efficiently described via the eigensystem of . This connection provides two key results: first, for a symmetric linear-quadratic regulator (LQR), matches the Hamiltonian of a quantum harmonic oscillator, whose closed-form eigensystem yields an analytic solution to the symmetric LQR with \emph{arbitrary} terminal cost. Second, in a more general setting, we learn the eigensystem of using neural networks. We identify implicit reweighting issues with existing eigenfunction learning losses that degrade performance in control tasks, and propose a novel loss function to mitigate this. We evaluate our method on several long-horizon benchmarks, achieving an order-of-magnitude improvement in control accuracy compared to state-of-the-art methods, while reducing memory usage and runtime complexity from to .
Paper Structure (70 sections, 20 theorems, 96 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 70 sections, 20 theorems, 96 equations, 9 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{L}$ be an essentially self-adjointAn operator is called essentially self-adjoint if its closure is self-adjoint. See reed1980methods and reed1975ii for more details., densely defined operator on $\mathcal{H}$ which admits an orthonormal basis of eigenfunctions $(\phi_i, \lambda_i)_{i\i

Figures (9)

  • Figure 1: Performance degradation as time horizon $T$ increases for different methods (see \ref{['app:experiments']} for details).
  • Figure 2: Diminishing returns from increasing the number of eigenfunctions for an LQR in $d=20$ dimensions.
  • Figure 3: Learned controls (arrows) and $V_0$ for different eigenfunction losses. Existing methods fail to learn the correct control in regions where $V_0$ is large due to implicit reweighting.
  • Figure 4: Comparison of the different eigenfunction losses (EMA).
  • Figure 5: Average $L^2$ control error (EMA) as a function of iteration (top row) and $L^2$ error as a function of $t\in[0,T]$ (bottom row).
  • ...and 4 more figures

Theorems & Definitions (28)

  • Definition 1
  • Theorem 1: Restatement of Theorem VIII.7 in reed1980methods
  • Theorem 2: Restatement of reed1978iv, Theorem XIII.67, XIII.64, XIII.47
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Remark 2
  • Definition 2
  • Lemma 1
  • Theorem 5
  • ...and 18 more