Table of Contents
Fetching ...

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

Thao Dang, Alexandre Donzé, Inzemamul Haque, Nikolaos Kekatos, Indranil Saha

TL;DR

This work presents a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL) and introduces a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements.

Abstract

We present a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL). More concretely we focus on the problem of training a neural network to imitate a complex controller. The learning process is guided by efficient data aggregation based on counter-examples and a coverage measure. Moreover, we introduce a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements. We demonstrate our approach with a flying robot case study.

Counter-example guided Imitation Learning of Feedback Controllers from Temporal Logic Specifications

TL;DR

This work presents a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL) and introduces a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements.

Abstract

We present a novel method for imitation learning for control requirements expressed using Signal Temporal Logic (STL). More concretely we focus on the problem of training a neural network to imitate a complex controller. The learning process is guided by efficient data aggregation based on counter-examples and a coverage measure. Moreover, we introduce a method to evaluate the performance of the learned controller via parameterization and parameter estimation of the STL requirements. We demonstrate our approach with a flying robot case study.
Paper Structure (13 sections, 4 equations, 4 figures, 3 algorithms)

This paper contains 13 sections, 4 equations, 4 figures, 3 algorithms.

Figures (4)

  • Figure 1: Volume estimation of False parameter set. Each $p_i$ outside the Valid set is close to the Pareto front and defines a closed hyper-box $p_i^\bot$ strictly included in False$(\Phi)$. Computing the volume of $\cup_i p_i^\bot$ yields an under-approximation of $vol(\text{False}(\Phi)$.
  • Figure 2: One iteration of the learning algorithm.
  • Figure 3: Result of training NN controllers for the flying robot. Good performance is obtained after only 5 iterations. The red region (Nominal Controller False Domain) represents the valuations $p$ for which $\Phi(p)$ are not satisfied by the nominal controller. The boundary of this region represents the Pareto front of the nominal controller. Other plots represent the Pareto fronts for several instances of the NN controllers computed for different iterations. Similarity with the nominal MPC controller is indicated in the legend.
  • Figure :

Theorems & Definitions (1)

  • Definition 1: Control Policy Similarity