Table of Contents
Fetching ...

FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization

Hardik Parwana, Dimitra Panagou

TL;DR

FORESEE tackles online policy optimization under uncertain nonlinear dynamics by integrating an Expansion-Compression Unscented Transform for efficient state-distribution prediction with a differentiable, constraint-aware gradient-descent framework for online policy updates. The EC-UT propagates a finite sigma-point set, expands it to capture state-dependent uncertainty, and compresses it via moment matching to keep the computation scalable. The online optimization leverages a SQP-inspired gradient step with constrained updates and slack-based feasibility when needed, enabling receding-horizon control under probabilistic constraints. Across quadrotor and leader-follower benchmarks, FORESEE achieves competitive prediction accuracy to Monte Carlo while reducing computational cost, and enables online controller tuning that maintains safety and improves performance under constraints.

Abstract

Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem.

FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization

TL;DR

FORESEE tackles online policy optimization under uncertain nonlinear dynamics by integrating an Expansion-Compression Unscented Transform for efficient state-distribution prediction with a differentiable, constraint-aware gradient-descent framework for online policy updates. The EC-UT propagates a finite sigma-point set, expands it to capture state-dependent uncertainty, and compresses it via moment matching to keep the computation scalable. The online optimization leverages a SQP-inspired gradient step with constrained updates and slack-based feasibility when needed, enabling receding-horizon control under probabilistic constraints. Across quadrotor and leader-follower benchmarks, FORESEE achieves competitive prediction accuracy to Monte Carlo while reducing computational cost, and enables online controller tuning that maintains safety and improves performance under constraints.

Abstract

Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem.
Paper Structure (22 sections, 1 theorem, 48 equations, 9 figures, 4 algorithms)

This paper contains 22 sections, 1 theorem, 48 equations, 9 figures, 4 algorithms.

Key Result

Theorem 1

Suppose the sigma points $\mathcal{S}_{t,i}, w_{t,i},i\in\{1,2,..,N\}$ have sample moments equal to moments of random variable $x_t\in \mathbb R^n$. For each sigma point $\mathcal{S}_{t,i}$, consider the $N'$ new sigma points and weights denoted by $\mathcal{S}_{t,i}^j, w_{t,i}^j$, where $j\in\{1,2.

Figures (9)

  • Figure 1: (a) The red particle (sigma point) $\mathcal{S}_{t,i}$ and their weights $w_{t,i}$ represents the distribution of state $\hat{x}_t$. (b) Since the uncertainty is state-dependent, each red point $\mathcal{S}_{t, i}$ gives rise to the states in the yellow region, each of which is represented by three blue points $\mathcal{S}_{t, i}^j$ and weights $w_{t,i}^j$ in the expansion layer. (c) The overall weight of $\mathcal{S}_{t,i}^j$ is obtained by multiplying the weight of the edge with the weight of the corresponding root node. (d) The 15 points resulting from the expansion layer are then compressed back to five red-blue particles in the compression layer to represent the blue distribution of $x_{t+1}$.
  • Figure 2: The predicted state distribution under different methods for dynamics given by gamma distribution in \ref{['eq::gamma_dynamics']}. The colors of samples represent follow the notation in Fig. \ref{['fig::gaussian_nonlinear_dynamics']}.
  • Figure 3: The predicted state distribution under different methods: Blue samples correspond to the 50,000 MC particles. Yellow and salmon-colored samples are the sigma points generated using successive Expansion-Compression (EC) operations. Yellow samples employ UT (\ref{['algo::UT']}) and salmon samples employ Generalized UT ebeigbe2021generalized in its expansion and compression operation. Red ellipse represents the 95% confidence ellipse of the distribution obtained by successive Gaussian approximation.
  • Figure 4: First four moments with different methods: Monte Carlo (MC), succcessive Gaussian, Expansion-Compression (EC) with UT, and with Generalized UT (GenUT). The successive Gaussian approximation is only shown for the first two moments as it is identically zero and constant for skewness and kurtosis. The shaded areas show two standard deviation variations in MC runs.
  • Figure 5: Time taken by each of methods in a naive Python implementation (without any optimizations): Monte Carlo (MC), Expansion-Compression (EC), and Expansion-only (E). All operations are performed on a single core of CPU. H value denotes the horizon. The time per step for MC and EC is fixed. The time taken per step by E Layer increases with horizon as the number of particles increases.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Remark 3
  • proof