Table of Contents
Fetching ...

Receding-Horizon Policy Gradient for Polytopic Controller Synthesis

Shiva Shakeri, Péter Baranyi, Mehran Mesbahi

Abstract

We propose the Polytopic Receding-Horizon Policy Gradient (P-RHPG) algorithm for synthesizing Parallel Distributed Compensation (PDC) controllers via Tensor Product (TP) model transformation. Standard LMI-based PDC synthesis grows increasingly conservative as model fidelity improves; P-RHPG instead solves a finite-horizon integrated cost via backward-stage decomposition. The key result is that each stage subproblem is a strongly convex quadratic in the vertex gains, a consequence of the linear independence of the HOSVD weighting functions, guaranteeing a unique global minimizer and linear convergence of gradient descent from any initialization. With zero terminal cost, the optimal cost increases monotonically to a finite limit and the gain sequence remains bounded; terminal costs satisfying a mild Lyapunov condition yield non-increasing convergence. Experiments on an aeroelastic wing benchmark confirm convergence to a unique infinite-horizon optimum across all tested terminal cost choices and near-optimal performance relative to the pointwise Riccati lower bound.

Receding-Horizon Policy Gradient for Polytopic Controller Synthesis

Abstract

We propose the Polytopic Receding-Horizon Policy Gradient (P-RHPG) algorithm for synthesizing Parallel Distributed Compensation (PDC) controllers via Tensor Product (TP) model transformation. Standard LMI-based PDC synthesis grows increasingly conservative as model fidelity improves; P-RHPG instead solves a finite-horizon integrated cost via backward-stage decomposition. The key result is that each stage subproblem is a strongly convex quadratic in the vertex gains, a consequence of the linear independence of the HOSVD weighting functions, guaranteeing a unique global minimizer and linear convergence of gradient descent from any initialization. With zero terminal cost, the optimal cost increases monotonically to a finite limit and the gain sequence remains bounded; terminal costs satisfying a mild Lyapunov condition yield non-increasing convergence. Experiments on an aeroelastic wing benchmark confirm convergence to a unique infinite-horizon optimum across all tested terminal cost choices and near-optimal performance relative to the pointwise Riccati lower bound.

Paper Structure

This paper contains 17 sections, 7 theorems, 33 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma C.2

Under Assumption ass:weights, $\Gamma \succ 0$.

Figures (3)

  • Figure 3: $J_N^*$ vs. horizon $N$ ($N_{\mathrm{v}}{=}27$, grid $[3,3,3]$) for six terminal cost choices. Left: full view (log scale); right: zoom over $N \geq 50$ (linear scale). $Q_N{=}0$ (blue) rises monotonically; $Q_N{=}P_{\mathrm{are}}$ (red) decreases monotonically. All six are bounded by $J_\infty(\mathbf{K}^{\mathrm{feas}})$ (purple) and converge to $\bar{J}$ (dotted). Squeeze gap at $N{=}1000$: $0.006\%$.
  • Figure 4: Closed-loop trajectories at $V{=}35$ m/s (above flutter). P-RHPG ($[3,2,2]$ grid, $N{=}100$) suppresses both plunge and pitch, settling by $0.78$ s. LMI-PDC ($N_{\mathrm{v}}{=}8$) is frozen-parameter stable but exhibits slow transient decay. Red dashed line: open-loop divergence time ($t{\approx}3.6$ s). Inset: zoom of pitch transient over $[0.5, 1.5]$ s.
  • Figure 5: Gradient norm $\|\nabla\Phi_h^{(k)}\|_F$ vs. iteration $k$ for three representative stages ($[3,2,2]$ grid, $N{=}100$). Dotted lines: theoretical rate $(1-\kappa^{-1})^k$.

Theorems & Definitions (19)

  • Remark B.3: Frozen-parameter approximation
  • Definition C.1: Gram matrix
  • Lemma C.2
  • proof
  • Theorem C.3: Strong convexity
  • proof
  • Corollary C.4
  • Proposition C.5: Stage gradient
  • proof
  • Remark D.1: Computational complexity
  • ...and 9 more