Table of Contents
Fetching ...

Control-Oriented Identification for the Linear Quadratic Regulator: Technical Report

Sean Anderson, João Pedro Hespanha

TL;DR

This work tackles data-driven control under model uncertainty by proposing a control-oriented offline experiment design that optimizes post-experiment controller performance. It develops a gradient-descent framework that uses a pathwise gradient estimator to solve the resulting nonconvex design problem, enabling SGD-based optimization of the experiment inputs. Specializing to the finite-horizon LQR, it derives a weighted Bayesian system identification scheme coupled with certainty-equivalence control and provides gradient expressions to enable efficient optimization. Numerical experiments in a car-string setting show the proposed design outperforms traditional A- and L-optimal designs and a robust dual-control approach, with favorable scaling and practical implications for data-efficient controller design.

Abstract

Data-driven control benefits from rich datasets, but constructing such datasets becomes challenging when gathering data is limited. We consider an offline experiment design approach to gathering data where we design a control input to collect data that will most improve the performance of a feedback controller. We show how such a control-oriented approach can be used in a setting with linear dynamics and quadratic objective and, through design of a gradient estimator, solve the problem via stochastic gradient descent. We show our formulation numerically outperforms an A- and L-optimal experiment design approach as well as a robust dual control approach.

Control-Oriented Identification for the Linear Quadratic Regulator: Technical Report

TL;DR

This work tackles data-driven control under model uncertainty by proposing a control-oriented offline experiment design that optimizes post-experiment controller performance. It develops a gradient-descent framework that uses a pathwise gradient estimator to solve the resulting nonconvex design problem, enabling SGD-based optimization of the experiment inputs. Specializing to the finite-horizon LQR, it derives a weighted Bayesian system identification scheme coupled with certainty-equivalence control and provides gradient expressions to enable efficient optimization. Numerical experiments in a car-string setting show the proposed design outperforms traditional A- and L-optimal designs and a robust dual-control approach, with favorable scaling and practical implications for data-efficient controller design.

Abstract

Data-driven control benefits from rich datasets, but constructing such datasets becomes challenging when gathering data is limited. We consider an offline experiment design approach to gathering data where we design a control input to collect data that will most improve the performance of a feedback controller. We show how such a control-oriented approach can be used in a setting with linear dynamics and quadratic objective and, through design of a gradient estimator, solve the problem via stochastic gradient descent. We show our formulation numerically outperforms an A- and L-optimal experiment design approach as well as a robust dual control approach.
Paper Structure (22 sections, 62 equations, 3 figures, 1 algorithm)

This paper contains 22 sections, 62 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: We compare the performance of our control-oriented system identification against A-optimal experiment design for a system with five states and three inputs, and known initial condition $x_0=[0., -4.3, 0., 2.1, 2.5]^T$ as in levine_optimal_1966. The value of $\beta$ is varied and this constraint is active in all cases. We include 95$\%$ confidence intervals using $10^5$ samples.
  • Figure 2: In the upper subplot, the number of iterations to converge for the car string problem is essentially the same regardless of system size suggesting good scaling properties of our method. In the lower subplot, we observe the average time to compute a single gradient sample on an Apple M1 Pro 10 core CPU with 32GB RAM in JAX jax2018github. The time is dominated by solving the control problem and "overhead" refers to tasks such as automatic differentiation, initial compile time, etc.
  • Figure 3: We show the performance of our proposed method against RRL by varying the prior information, which is achieved by varying $N_{traj}$ from 500 trajectories used in the umenberger2019 down to 200 in increments of 50. The mean value for our method is shown in blue where the post-experiment cost remains below 200 for all priors. For RRL, we observe that a few very large samples move the RRL mean to be very large such that we use arrows to indicate the values lie outside the axis limits.

Theorems & Definitions (2)

  • proof
  • proof