Table of Contents
Fetching ...

A Sequential Quadratic Programming Perspective on Optimal Control

Abhijeet, Suman Chakravorty

Abstract

This paper offers a unified perspective on different approaches to the solution of optimal control problems through the lens of constrained sequential quadratic programming. In particular, it allows us to find the relationships between Newton's method, the iterative LQR (iLQR), and Differential Dynamic Programming (DDP) approaches to solve the problem. It is shown that the iLQR is a principled SQP approach, rather than simply an approximation of DDP by neglecting the Hessian terms, to solve optimal control problems that can be guaranteed to always produce a cost-descent direction and converge to an optimum; while Newton's approach or DDP do not have similar guarantees, especially far from an optimum. Our empirical evaluations on the pendulum and cart-pole swing-up tasks serve to corroborate the SQP-based analysis proposed in this paper.

A Sequential Quadratic Programming Perspective on Optimal Control

Abstract

This paper offers a unified perspective on different approaches to the solution of optimal control problems through the lens of constrained sequential quadratic programming. In particular, it allows us to find the relationships between Newton's method, the iterative LQR (iLQR), and Differential Dynamic Programming (DDP) approaches to solve the problem. It is shown that the iLQR is a principled SQP approach, rather than simply an approximation of DDP by neglecting the Hessian terms, to solve optimal control problems that can be guaranteed to always produce a cost-descent direction and converge to an optimum; while Newton's approach or DDP do not have similar guarantees, especially far from an optimum. Our empirical evaluations on the pendulum and cart-pole swing-up tasks serve to corroborate the SQP-based analysis proposed in this paper.

Paper Structure

This paper contains 12 sections, 3 theorems, 46 equations, 5 figures, 2 tables.

Key Result

Proposition 1

Consider the modified QP M-QP. Given $C_{xx} \succ 0$, the solution to the QP, $\delta x$, is always a descent direction for the cost function $c(x)$. Furthermore, the new solution $x = \bar{x} + \alpha \delta x$ can be made feasible, i.e., $h(x) = 0$, by a suitable choice of the line-search paramet

Figures (5)

  • Figure 1: The plot of $Q_{uu}$ for the cartpole and pendulum problems for the first iteration starting with random initial guesses.
  • Figure 2: The plot of learning rate $\alpha$ and cost vs iteration for the pendulum swing-up task starting with random initial guesses. DDP cools down $\alpha$ as the cost predictions are corrupted.
  • Figure 3: The plot of learning rate $\alpha$ and cost vs iteration for the cart-pole problem starting with random initial guesses. DDP cools down $\alpha$ as the cost change predictions are corrupted.
  • Figure 4: A comparison of iLQR, DDP and hybrid solution. The hybrid solution depicts iLQR applied to the point where DDP solution slows down as $\alpha$ reduces drastically (depicted in Figures \ref{['fig:cool_pendulum']} and \ref{['fig:cool_cartpole']}).
  • Figure 5: Cost vs. number of iterations. With an initial control guess near the optimum, DDP reaches the optimum in fewer iterations.

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • Remark 1
  • Remark 2
  • Proposition 3