Table of Contents
Fetching ...

An active learning method for solving competitive multi-agent decision-making and control problems

Filippo Fabiani, Alberto Bemporad

TL;DR

A novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings to identify a stationary action profile.

Abstract

To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.

An active learning method for solving competitive multi-agent decision-making and control problems

TL;DR

A novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings to identify a stationary action profile.

Abstract

To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.
Paper Structure (28 sections, 7 theorems, 29 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 7 theorems, 29 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Lemma 3.3

Let Assumption ass:convexity-1 hold true. Then, given any $\theta\in\mathbb{R}^p$, the set $\mathcal{M}(\theta)\coloneqq {\textnormal{argmin}}_{y\in \mathcal{X}}~r(y,\theta)$ is convex. If also Assumption ass:convexity-2 holds true, then the value function $r^\star:\mathbb{R}^p\to\mathbb{R}$, $r^\st

Figures (13)

  • Figure 1: An external observer with learning procedure $\mathscr{L}$ makes queries $\hat{\boldsymbol{x}}$ and observes each reaction to $\hat{\boldsymbol{x}}_{-i}$ (dashed black lines) taken by agent $i \in \mathcal{N}$ (red circles) through its private action-reaction mapping, $f_i(\cdot)$, which may depend on the decision of any other agent in $\mathcal{N}$ (solid red lines), with the goal of predicting an outcome of the multi-agent interaction process.
  • Figure 2: With $N=2$, $n_1=n_2=1$, and affine $\hat{f}_i$'s constructed using feasible samples (red and blue dots), solving \ref{['eq:linear-system']} would return the green point as a unique minimizer outside $\Omega$ (black box). While agent $2$ can still provide a feasible reaction to this infeasible query point (decision to be made along the dashed blue line), agent $1$ can not (decision along the dashed red line).
  • Figure 3: Sequence $\{\hat{\boldsymbol{x}}^k\}_{k\in\mathbb{N}}$, averaged over $25$ numerical instances (solid blue line). The shaded black region corresponds to the passive random data collection phase, which is reported for the interval $k\in[100,200]$ only for illustrative purposes.
  • Figure 4: Sum between the normalized average inflexible demand $d$ and a sample of five charging strategies $(1/N) \sum_{i\in \mathcal{N}} x_i^\star(\bar{a}, \bar{b})$ at the equilibrium, for given price signals $\bar{a}$, $\bar{b}>0$ (dashed dotted coloured lines). The shaded blue area denotes the union over all the $25$ experiments performed.
  • Figure 5: Hyperparameters analysis on the Nash equilibrium problem formalized in salehisadaghiani2017admm.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Definition 2.1
  • Lemma 3.3
  • Definition 3.4: Lower semicontinuity, rockafellar2009variational
  • Definition 3.5: Level-boundedness, rockafellar2009variational
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 4.2
  • Proposition 4.4
  • Theorem 4.5
  • Corollary 4.6