Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

Thao Dang; Alexandre Donzé; Inzemamul Haque; Nikolaos Kekatos; Indranil Saha

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

Thao Dang, Alexandre Donzé, Inzemamul Haque, Nikolaos Kekatos, Indranil Saha

TL;DR

This work presents a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL) and introduces a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements.

Abstract

We present a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL). More concretely we focus on the problem of training a neural network to imitate a complex controller. The learning process is guided by efficient data aggregation based on counter-examples and a coverage measure. Moreover, we introduce a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements. We demonstrate our approach with a flying robot case study.

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

TL;DR

Abstract

Paper Structure (13 sections, 4 equations, 4 figures, 3 algorithms)

This paper contains 13 sections, 4 equations, 4 figures, 3 algorithms.

Introduction
Controller Imitation Learning Problem
Control Requirements
Signal Temporal Logic bartocci2018specification
Parametric Signal Temporal Logic asarin_parametric_2011
Control Policy Performance Measure
Imitation Learning Problem Formulation
Feedback Controller Learning Methodology
Neural Network Structure and Training
Coverage based Data Generation
Dataset Aggregation-based Training
Flying Robot Case Study
Conclusion

Figures (4)

Figure 1: Volume estimation of False parameter set. Each $p_i$ outside the Valid set is close to the Pareto front and defines a closed hyper-box $p_i^\bot$ strictly included in False$(\Phi)$. Computing the volume of $\cup_i p_i^\bot$ yields an under-approximation of $vol(\text{False}(\Phi)$.
Figure 2: One iteration of the learning algorithm.
Figure 3: Result of training NN controllers for the flying robot. Good performance is obtained after only 5 iterations. The red region (Nominal Controller False Domain) represents the valuations $p$ for which $\Phi(p)$ are not satisfied by the nominal controller. The boundary of this region represents the Pareto front of the nominal controller. Other plots represent the Pareto fronts for several instances of the NN controllers computed for different iterations. Similarity with the nominal MPC controller is indicated in the legend.
Figure :

Theorems & Definitions (1)

Definition 1: Control Policy Similarity

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

TL;DR

Abstract

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)