Table of Contents
Fetching ...

On the Feedback Law in Stochastic Optimal Nonlinear Control

Mohamed Naveed Gul Mohamed, Suman Chakravorty, Raman Goyal, Ran Wang

TL;DR

Empirical results show that solving the Stochastic Dynamic Programming (DP) problem is highly susceptible to noise, even when tractable, and in practice, the MPC-type feedback law offers superior performance even for stochastic systems.

Abstract

We consider the problem of nonlinear stochastic optimal control. This problem is thought to be fundamentally intractable owing to Bellman's "curse of dimensionality". We present a result that shows that repeatedly solving an open-loop deterministic problem from the current state with progressively shorter horizons, similar to Model Predictive Control (MPC), results in a feedback policy that is $O(ε^4)$ near to the true global stochastic optimal policy, where $ε$ is a perturbation parameter modulating the noise. We also show that the optimal deterministic feedback problem has a perturbation structure such that higher-order terms of the feedback law do not affect lower-order terms and that this structure is lost in the optimal stochastic feedback problem. Consequently, solving the Stochastic Dynamic Programming problem is highly susceptible to noise, even in low dimensional problems, and in practice, the MPC-type feedback law offers superior performance even for high noise levels.

On the Feedback Law in Stochastic Optimal Nonlinear Control

TL;DR

Empirical results show that solving the Stochastic Dynamic Programming (DP) problem is highly susceptible to noise, even when tractable, and in practice, the MPC-type feedback law offers superior performance even for stochastic systems.

Abstract

We consider the problem of nonlinear stochastic optimal control. This problem is thought to be fundamentally intractable owing to Bellman's "curse of dimensionality". We present a result that shows that repeatedly solving an open-loop deterministic problem from the current state with progressively shorter horizons, similar to Model Predictive Control (MPC), results in a feedback policy that is near to the true global stochastic optimal policy, where is a perturbation parameter modulating the noise. We also show that the optimal deterministic feedback problem has a perturbation structure such that higher-order terms of the feedback law do not affect lower-order terms and that this structure is lost in the optimal stochastic feedback problem. Consequently, solving the Stochastic Dynamic Programming problem is highly susceptible to noise, even in low dimensional problems, and in practice, the MPC-type feedback law offers superior performance even for high noise levels.

Paper Structure

This paper contains 17 sections, 9 theorems, 61 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Given any sample path, the state perturbation equation given in eq.6 can be equivalently characterized as where $e_k$ is an $O(\epsilon^2)$ function that depends on the entire noise history $\{w_0,w_1,\cdots w_k\}$ and $\delta x_k^l$ evolves according to the linear closed-loop system. Furthermore, $e_k = e_k^{(2)} + O(\epsilon^3)$, where $e_k^{(2)} = \bar{A}_{k-1} e_{k-1}^{(2)} + $, $e_0^{(2)} =

Figures (5)

  • Figure 1: Performance comparison of the solution to HJB (Eq. \ref{['eq.hjb_stochastic']}) obtained from finite difference scheme and LQR on the linear system. The mean and standard deviation of the cost incurred by the system are calculated from experiments for different cases of $\epsilon$. The data shown is from simulating the linear system from initial condition of $x_0 = 1$.
  • Figure 2: Left column: Comparison of expected cost-to-go value from the HJB-FD solution and the LQR Riccati solution at $t = 0.81$. The plot shows that the HJB-FD cost-to-go doesn't match the LQR cost-to-go (which is the optimal) at high noise levels. Right column: Sample trajectories of the linear system at different noise levels under the policy computed by HJB-FD. Since the trajectory could leave the domain in high noise cases, the expected cost-to-go calculated in seeking the stochastic feedback policy will be inaccurate. (The trajectories were generated with the original sampling time of the FD solver, but the data is plotted at a larger sampling interval for the sake of clarity.)
  • Figure 3: Performance comparison of HJB-FD and MPC-SH on the 1-D nonlinear system for different noise levels.
  • Figure 4: (a) Comparison of the expected cost-to-go obtained from the HJB-FD solution and the actual cost incurred by applying the HJB-FD feedback policy on the nonlinear system. The cost-to-go is obtained for the initial condition $x_0 = 1$ and the actual cost is the average cost of 500 simulations. Trajectory samples of the nonlinear system under the MPC-SH policy are shown in (b), and under HJB-FD policy for two different cases of $\epsilon$ are shown in (c) and (d).
  • Figure 5: Performance comparison of T-PFC with MPC-SH in nonlinear robotics systems. Both policies are computed for a specific initial condition and tested on 500 different samples for each value of $\epsilon$ to find the cost statistics. The car-like robot considered is a 4-D system and is governed by the equations $\dot{x} = v cos\theta$, $\dot{y} = v sin\theta$, $\dot{\theta} = \frac{v}{L}tan \phi$, $\dot{\phi}= \omega$, where $v, \omega$ are the control inputs and $L$ is the length of the car. The cart-pole is also a 4-D system and is governed by $(M+m)\ddot{x} - mL \dot{\theta}^2 sin\theta + mL\ddot{\theta} cos \theta = F$, $mL^2\ddot{\theta} + mL\ddot{x} cos\theta +mgL sin\theta = 0$, where $F$ is the control input, and $M, m, L$ are the mass of the cart, mass of the pole and length of the pole. Process noise was added to the above systems after propagating the dynamics at every time step. The standard deviation of the noise added was the maximum value of the states in the optimal nominal trajectory.

Theorems & Definitions (18)

  • Lemma 1
  • Lemma 2
  • Proposition 1
  • Remark 1
  • Remark 2
  • Proposition 2
  • Remark 3
  • Remark 4
  • Proposition 3
  • Definition 1
  • ...and 8 more