An active learning method for solving competitive multi-agent decision-making and control problems

Filippo Fabiani; Alberto Bemporad

An active learning method for solving competitive multi-agent decision-making and control problems

Filippo Fabiani, Alberto Bemporad

TL;DR

A novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings to identify a stationary action profile.

Abstract

To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.

An active learning method for solving competitive multi-agent decision-making and control problems

TL;DR

Abstract

Paper Structure (28 sections, 7 theorems, 29 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 7 theorems, 29 equations, 13 figures, 3 tables, 1 algorithm.

Introduction
Motivating example: forecasting price-responses to control the aggregated electricity consumption in smart grids
Related work
Summary of contributions and paper organization
Learning problem
Mathematical formulation
Applications in multi-agent control and decision-making
Learning equilibria in generalized Nash games
Multi-agent feedback controller synthesis
Preliminary results
Active learning procedure and main results
Algorithm description
Initialization
Complexity analysis
Asymptotic properties
...and 13 more sections

Key Result

Lemma 3.3

Let Assumption ass:convexity-1 hold true. Then, given any $\theta\in\mathbb{R}^p$, the set $\mathcal{M}(\theta)\coloneqq {\textnormal{argmin}}_{y\in \mathcal{X}}~r(y,\theta)$ is convex. If also Assumption ass:convexity-2 holds true, then the value function $r^\star:\mathbb{R}^p\to\mathbb{R}$, $r^\st

Figures (13)

Figure 1: An external observer with learning procedure $\mathscr{L}$ makes queries $\hat{\boldsymbol{x}}$ and observes each reaction to $\hat{\boldsymbol{x}}_{-i}$ (dashed black lines) taken by agent $i \in \mathcal{N}$ (red circles) through its private action-reaction mapping, $f_i(\cdot)$, which may depend on the decision of any other agent in $\mathcal{N}$ (solid red lines), with the goal of predicting an outcome of the multi-agent interaction process.
Figure 2: With $N=2$, $n_1=n_2=1$, and affine $\hat{f}_i$'s constructed using feasible samples (red and blue dots), solving \ref{['eq:linear-system']} would return the green point as a unique minimizer outside $\Omega$ (black box). While agent $2$ can still provide a feasible reaction to this infeasible query point (decision to be made along the dashed blue line), agent $1$ can not (decision along the dashed red line).
Figure 3: Sequence $\{\hat{\boldsymbol{x}}^k\}_{k\in\mathbb{N}}$, averaged over $25$ numerical instances (solid blue line). The shaded black region corresponds to the passive random data collection phase, which is reported for the interval $k\in[100,200]$ only for illustrative purposes.
Figure 4: Sum between the normalized average inflexible demand $d$ and a sample of five charging strategies $(1/N) \sum_{i\in \mathcal{N}} x_i^\star(\bar{a}, \bar{b})$ at the equilibrium, for given price signals $\bar{a}$, $\bar{b}>0$ (dashed dotted coloured lines). The shaded blue area denotes the union over all the $25$ experiments performed.
Figure 5: Hyperparameters analysis on the Nash equilibrium problem formalized in salehisadaghiani2017admm.
...and 8 more figures

Theorems & Definitions (10)

Definition 2.1
Lemma 3.3
Definition 3.4: Lower semicontinuity, rockafellar2009variational
Definition 3.5: Level-boundedness, rockafellar2009variational
Lemma 3.6
Lemma 3.7
Lemma 4.2
Proposition 4.4
Theorem 4.5
Corollary 4.6

An active learning method for solving competitive multi-agent decision-making and control problems

TL;DR

Abstract

An active learning method for solving competitive multi-agent decision-making and control problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (10)