Table of Contents
Fetching ...

Policy Learning for Perturbance-wise Linear Quadratic Control Problem

Haoran Zhang, Wenhao Zhang, Xianping Wu

TL;DR

The paper addresses finite-horizon LQ control with additive noise under a perturbance-wise framework that unifies the classical model, constraint-embedded affine policies, and a Wasserstein DRO approach. It develops an augmented-affine policy representation and derives an exact policy gradient, proving global convergence under constant stepsizes with problem-parameter–based polynomial bounds. The work integrates a ROC-like Riccati recursion for both constrained and distributionally robust variants, and validates the methods through mean-variance portfolio optimization and a real-data dynamic-tracking task, revealing trade-offs across horizon length, trading costs, state penalties, and estimation windows. This yields a practical, theoretically-grounded toolkit for learning robust, constrained LQ controllers with finite data, with potential extensions to richer ambiguity sets and partial-observation settings.

Abstract

We study finite horizon linear quadratic control with additive noise in a perturbancewise framework that unifies the classical model, a constraint embedded affine policy class, and a distributionally robust formulation with a Wasserstein ambiguity set. Based on an augmented affine representation, we model feasibility as an affine perturbation and unknown noise as distributional perturbation from samples, thereby addressing constrained implementation and model uncertainty in a single scheme. First, we construct an implementable policy gradient method that accommodates nonzero noise means estimated from data. Second, we analyze its convergence under constant stepsizes chosen as simple polynomials of problem parameters, ensuring global decrease of the value function. Finally, numerical studies: mean variance portfolio allocation and dynamic benchmark tracking on real data, validating stable convergence and illuminating sensitivity tradeoffs across horizon length, trading cost intensity, state penalty scale, and estimation window.

Policy Learning for Perturbance-wise Linear Quadratic Control Problem

TL;DR

The paper addresses finite-horizon LQ control with additive noise under a perturbance-wise framework that unifies the classical model, constraint-embedded affine policies, and a Wasserstein DRO approach. It develops an augmented-affine policy representation and derives an exact policy gradient, proving global convergence under constant stepsizes with problem-parameter–based polynomial bounds. The work integrates a ROC-like Riccati recursion for both constrained and distributionally robust variants, and validates the methods through mean-variance portfolio optimization and a real-data dynamic-tracking task, revealing trade-offs across horizon length, trading costs, state penalties, and estimation windows. This yields a practical, theoretically-grounded toolkit for learning robust, constrained LQ controllers with finite data, with potential extensions to richer ambiguity sets and partial-observation settings.

Abstract

We study finite horizon linear quadratic control with additive noise in a perturbancewise framework that unifies the classical model, a constraint embedded affine policy class, and a distributionally robust formulation with a Wasserstein ambiguity set. Based on an augmented affine representation, we model feasibility as an affine perturbation and unknown noise as distributional perturbation from samples, thereby addressing constrained implementation and model uncertainty in a single scheme. First, we construct an implementable policy gradient method that accommodates nonzero noise means estimated from data. Second, we analyze its convergence under constant stepsizes chosen as simple polynomials of problem parameters, ensuring global decrease of the value function. Finally, numerical studies: mean variance portfolio allocation and dynamic benchmark tracking on real data, validating stable convergence and illuminating sensitivity tradeoffs across horizon length, trading cost intensity, state penalty scale, and estimation window.

Paper Structure

This paper contains 19 sections, 17 theorems, 147 equations, 6 figures, 1 table.

Key Result

Theorem 2.3

If Assumption light-tail holds, we have for all $M \geq 1$, $m \neq 2$, and $\varepsilon>0$, where $c_1$, $c_2$ are positive constants while $c_2$ depends on the $p$-norm.

Figures (6)

  • Figure 1: Performance of the policy gradient algorithm
  • Figure 2: Comparison for constrained and non-constrained
  • Figure 3: Normalized prices for ETFs and benchmark
  • Figure 4: Empirical features for the tracking model
  • Figure 5: DRO-LQ: policy gradient convergence and gain stabilization.
  • ...and 1 more figures

Theorems & Definitions (30)

  • Definition 2.1: Wasserstein metric
  • Theorem 2.3: Measure concentration
  • Theorem 2.4: Finite sample guarantee
  • Theorem 2.5: Asymptotic consistency
  • Lemma 2.7
  • Lemma 3.2: Gradient Representation
  • proof
  • Lemma 3.3
  • proof
  • Lemma 3.4
  • ...and 20 more