Active Learning for Control-Oriented Identification of Nonlinear Systems

Bruce D. Lee; Ingvar Ziemann; George J. Pappas; Nikolai Matni

Active Learning for Control-Oriented Identification of Nonlinear Systems

Bruce D. Lee, Ingvar Ziemann, George J. Pappas, Nikolai Matni

TL;DR

We address efficient exploration for control of unknown nonlinear dynamical systems within a model-based RL framework. We introduce ALCOI, an active learning algorithm that extends linear-in-parameters results to smooth nonlinear dynamics, reducing the end-to-end excess cost to the identification error via a Fisher-information-guided exploration design and a delta-method-based analysis, yielding a non-asymptotic bound: $\mathsf{excess\,cost} \leq \dfrac{\mathsf{hardness\ of\ control} \times \mathsf{hardness\ of\ identification}}{N}$ up to logarithmic factors. It proves the bound for a broad class of smooth nonlinear dynamics and validates the method in simulation on a 2D nonlinear system and a cartpole swing-up task, showing active, control-oriented exploration outperforms random or approximate A-optimal exploration. The results advance understanding of data-efficient, end-to-end learning-to-control with nonlinear dynamics, and provide a principled framework for designing informative experiments in nonlinear model-based reinforcement learning, with a focus on the interplay between control hardness and identification hardness.

Abstract

Model-based reinforcement learning is an effective approach for controlling an unknown system. It is based on a longstanding pipeline familiar to the control community in which one performs experiments on the environment to collect a dataset, uses the resulting dataset to identify a model of the system, and finally performs control synthesis using the identified model. As interacting with the system may be costly and time consuming, targeted exploration is crucial for developing an effective control-oriented model with minimal experimentation. Motivated by this challenge, recent work has begun to study finite sample data requirements and sample efficient algorithms for the problem of optimal exploration in model-based reinforcement learning. However, existing theory and algorithms are limited to model classes which are linear in the parameters. Our work instead focuses on models with nonlinear parameter dependencies, and presents the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics. In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors. We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.

Active Learning for Control-Oriented Identification of Nonlinear Systems

TL;DR

up to logarithmic factors. It proves the bound for a broad class of smooth nonlinear dynamics and validates the method in simulation on a 2D nonlinear system and a cartpole swing-up task, showing active, control-oriented exploration outperforms random or approximate A-optimal exploration. The results advance understanding of data-efficient, end-to-end learning-to-control with nonlinear dynamics, and provide a principled framework for designing informative experiments in nonlinear model-based reinforcement learning, with a focus on the interplay between control hardness and identification hardness.

Abstract

Paper Structure (21 sections, 20 theorems, 155 equations, 3 figures, 1 algorithm)

This paper contains 21 sections, 20 theorems, 155 equations, 3 figures, 1 algorithm.

Introduction
Contribution
Related Work
Additional Work Analyzing Identification & Control
Dual Control
Problem Formulation
Certainty Equivalent Control
Assumptions
Proposed Algorithm and Main Result
Proof Sketch
Numerical Validation
Conclusions
System Identification Results
Consistency of Least Squares Parameter Estimation
Improved Rates via the Delta Method
...and 6 more sections

Key Result

Theorem 1.1

Let the ALCOI algorithm interact with an unknown nonlinear dynamical system for some number of exploration rounds before proposing a control policy designed to optimize some objective. The excess cost of the proposed policy on the objective satisfies

Figures (3)

Figure 1: Identfication to control pipeline.
Figure 2: Comparison of the proposed control-oriented identification procedure with approximate $A$-optimal design, and random experiment design. The mean over $100$ runs is shown, with the standard error shaded.
Figure 3: Excess cost versus number of exploration episodes for the proposed control-oriented identification procedure, approximate $A$-optimal design, and random exploration for the cartpole swing-up task. The mean over $900$ runs is shown, and the standard error is shaded.

Theorems & Definitions (38)

Theorem 1.1: Main Result, Informal
Definition 2.1
Definition 2.2
Definition 2.3: Lojasiewicz condition, roulet2017sharpness
Lemma 3.1: Thm. 1 of wagenmaker2023optimal
Theorem 3.1
Theorem 3.2: Main Result
Lemma A.1
Definition A.1: Trajectory $(C_{\mathsf{hc}}, a)$-hypercontractivity
Lemma A.2
...and 28 more

Active Learning for Control-Oriented Identification of Nonlinear Systems

TL;DR

Abstract

Active Learning for Control-Oriented Identification of Nonlinear Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)