Table of Contents
Fetching ...

Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing

Luke Snow, Vikram Krishnamurthy

TL;DR

A minimax distributionally robust inverse reinforcement learning (IRL) algorithm is derived to reconstruct the utility functions of a multi-agent sensing system and it is proved the equivalence between this robust estimation and a semi-infinite optimization reformulation.

Abstract

We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.

Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing

TL;DR

A minimax distributionally robust inverse reinforcement learning (IRL) algorithm is derived to reconstruct the utility functions of a multi-agent sensing system and it is proved the equivalence between this robust estimation and a semi-infinite optimization reformulation.

Abstract

We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.
Paper Structure (12 sections, 3 theorems, 13 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 12 sections, 3 theorems, 13 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{D}$ be a set of observations. The following are equivalent:

Figures (1)

  • Figure 1: Average convergence of Algorithm \ref{['alg:dro']} for varying Wasserstein radii $\epsilon$, over 100 Monte-Carlo simulations.

Theorems & Definitions (8)

  • Definition 1: Multi-agent Bayesian Sensing System
  • Definition 2: Coordinated Sensing System
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2: Semi-Infinite Reformulation
  • proof