Table of Contents
Fetching ...

On Minimax Optimal Dual Control for Fully Actuated Systems

Anders Rantzer

TL;DR

The article addresses robust adaptive control under worst-case model uncertainty by formulating a minimax dynamic game. It derives an exact, closed-form solution to the Bellman equation for fully actuated LTI-like systems with A,B uncertainty and unit B up to sign, leading to a dual controller that optimally trades off exploration and exploitation. A key innovation is compressing past data into an augmented state Z, enabling a clean DP over (x,Z) and yielding explicit expressions for the optimal cost via a max over uncertain parameters. The results generalize scalar minimax insights to vector systems and quantify when learning improves performance, with implications for designing adaptive controllers under adversarial disturbances.

Abstract

A multi-variable adaptive controller is derived as the explicit solution to a minimax dynamic game. The minimizing player selects the control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. This leads to a Bellman equation that can be solved explicitly for the case with unitary B-matrix known up to a sign and no input penalty. The minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation.

On Minimax Optimal Dual Control for Fully Actuated Systems

TL;DR

The article addresses robust adaptive control under worst-case model uncertainty by formulating a minimax dynamic game. It derives an exact, closed-form solution to the Bellman equation for fully actuated LTI-like systems with A,B uncertainty and unit B up to sign, leading to a dual controller that optimally trades off exploration and exploitation. A key innovation is compressing past data into an augmented state Z, enabling a clean DP over (x,Z) and yielding explicit expressions for the optimal cost via a max over uncertain parameters. The results generalize scalar minimax insights to vector systems and quantify when learning improves performance, with implications for designing adaptive controllers under adversarial disturbances.

Abstract

A multi-variable adaptive controller is derived as the explicit solution to a minimax dynamic game. The minimizing player selects the control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. This leads to a Bellman equation that can be solved explicitly for the case with unitary B-matrix known up to a sign and no input penalty. The minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation.

Paper Structure

This paper contains 12 sections, 7 theorems, 43 equations, 2 figures.

Key Result

Theorem 1

Given $\alpha>0$, the value of (eqn:infsup) is finite if and only if $\gamma\ge\gamma_*$. For $\gamma=\gamma_*$, the minimal value of (eqn:infsup) is $(\gamma_*^2+1)|x_0|^2/2$ and a minimizing control law eqn:mu_LQ is where maximization is over $j\in\{0,1\}$, $x_{t+1}\in\mathbb{R}^n$ and $A,B\in\mathbb{R}^{n\times n}$ with $AA^\top=\alpha^2I$ and $B=\pm I$, while minimization is over random $u_t\

Figures (2)

  • Figure 1: We want a feedback controller that works for all system parameters within the given bounds. If $\alpha>1$, even stabilization is impossible when restricting to linear time-invariant controllers. Nonlinear adaptive controllers can do much better by estimating $(A,B)$ and use the estimate for control. The purpose of this paper is to optimize such controllers using a dynamic game formulation.
  • Figure 2: Two simulations of synchronizing curves are shown. The upper plot is for $n=10$, while the lower plot is for a $n=100$. The unknown $n\times n$ matrix is estimated using an optimized tradeoff between excitation and exploitation. In both figures, there is also a third curve, showing the error in the matrix estimate. The curve illustrates that the matrix convergence slows down after synchronization, since less data is then obtained. It should be noted that synchronization occurs after almost exactly $n$ time steps, when enough data has been collected to estimate all matrix directions.

Theorems & Definitions (7)

  • Theorem 1
  • Proposition 2
  • Theorem 3: Explicit expression for the optimal cost
  • Theorem 4
  • Theorem 5: Lower bound
  • Theorem 6: Upper bound
  • Proposition 7