Table of Contents
Fetching ...

PACE: A Framework for Learning and Control in Linear Incomplete-Information Differential Games

Seyed Yousef Soltanian, Wenlong Zhang

TL;DR

This work tackles two-player linear-quadratic differential games with incomplete information, where each agent lacks knowledge of the other's cost parameters. It introduces the Peer-Aware Cost Estimation (PACE) framework, which treats the peer as a learning agent and jointly learns the opponent's cost parameters and Riccati solution in real time by modeling the peer's learning dynamics from past state trajectories. The method provides convergence guarantees for parameter estimates and system stability, and demonstrates superior robustness and faster convergence compared with complete-information peer-approximation baselines in numerical experiments. The approach promises practical impact for human-robot interaction and multi-agent control by enabling real-time, data-driven inference of opponents' objectives using only shared state observations.

Abstract

In this paper, we address the problem of a two-player linear quadratic differential game with incomplete information, a scenario commonly encountered in multi-agent control, human-robot interaction (HRI), and approximation methods for solving general-sum differential games. While solutions to such linear differential games are typically obtained through coupled Riccati equations, the complexity increases when agents have incomplete information, particularly when neither is aware of the other's cost function. To tackle this challenge, we propose a model-based Peer-Aware Cost Estimation (PACE) framework for learning the cost parameters of the other agent. In PACE, each agent treats its peer as a learning agent rather than a stationary optimal agent, models their learning dynamics, and leverages this dynamic to infer the cost function parameters of the other agent. This approach enables agents to infer each other's objective function in real time based solely on their previous state observations and dynamically adapt their control policies. Furthermore, we provide a theoretical guarantee for the convergence of parameter estimation and the stability of system states in PACE. Additionally, in our numerical studies, we demonstrate how modeling the learning dynamics of the other agent benefits PACE, compared to approaches that approximate the other agent as having complete information, particularly in terms of stability and convergence speed.

PACE: A Framework for Learning and Control in Linear Incomplete-Information Differential Games

TL;DR

This work tackles two-player linear-quadratic differential games with incomplete information, where each agent lacks knowledge of the other's cost parameters. It introduces the Peer-Aware Cost Estimation (PACE) framework, which treats the peer as a learning agent and jointly learns the opponent's cost parameters and Riccati solution in real time by modeling the peer's learning dynamics from past state trajectories. The method provides convergence guarantees for parameter estimates and system stability, and demonstrates superior robustness and faster convergence compared with complete-information peer-approximation baselines in numerical experiments. The approach promises practical impact for human-robot interaction and multi-agent control by enabling real-time, data-driven inference of opponents' objectives using only shared state observations.

Abstract

In this paper, we address the problem of a two-player linear quadratic differential game with incomplete information, a scenario commonly encountered in multi-agent control, human-robot interaction (HRI), and approximation methods for solving general-sum differential games. While solutions to such linear differential games are typically obtained through coupled Riccati equations, the complexity increases when agents have incomplete information, particularly when neither is aware of the other's cost function. To tackle this challenge, we propose a model-based Peer-Aware Cost Estimation (PACE) framework for learning the cost parameters of the other agent. In PACE, each agent treats its peer as a learning agent rather than a stationary optimal agent, models their learning dynamics, and leverages this dynamic to infer the cost function parameters of the other agent. This approach enables agents to infer each other's objective function in real time based solely on their previous state observations and dynamically adapt their control policies. Furthermore, we provide a theoretical guarantee for the convergence of parameter estimation and the stability of system states in PACE. Additionally, in our numerical studies, we demonstrate how modeling the learning dynamics of the other agent benefits PACE, compared to approaches that approximate the other agent as having complete information, particularly in terms of stability and convergence speed.

Paper Structure

This paper contains 6 sections, 1 theorem, 58 equations, 7 figures, 1 table, 1 algorithm.

Key Result

theorem 1

If two agents begin with initial guesses for each other’s cost parameters that yield admissible policies, then under a sufficiently small learning rate $\alpha$ and a persistently exciting system state signal in the history stack $\mathcal{H}_k$, Algorithm alg:blame_all (PACE) converges to the true

Figures (7)

  • Figure 1: An illustrative example of two robotic agents moving an object with full state trajectory observation $x(t)$, although agents are not able to observe each other's interaction force $u(t)$. Assuming an accurate low-level control of the end effectors in the task space, the interaction dynamics is modeled as a linear system. The agents are unaware of each other’s cost function parameter, denoted as $\theta$. The agent $i$ focuses on minimizing the observed trajectory error and updating the parameter estimates in real time. However, in (A), $i$ assumes its partner has complete information (resulting in a biased estimation), whereas in (B), $i$ not only performs its own parameter estimation but also accounts for its partner’s learning process.
  • Figure 2: Monte Carlo study results of 500 random guesses for agents' initial estimates, $\hat{\theta}_{k}^{(0)}$ and $\hat{\theta}_{-k}^{(0)}$ (left); stability region analysis comparing PACE with the complete info peer approximation, showing how increasing the learning rate for each history size affect the stability boundaries(right).
  • Figure 3: Comparison of three algorithms in a multi-parameter estimation scenario.
  • Figure 4: State evolution under the PACE algorithm. The figure shows the dynamic evolution of system states over time. Each curve represents a specific state variable's trajectory during the simulation.
  • Figure 5: Control signals under the PACE algorithm. The figure compares the true control inputs (u1 and u2) with their corresponding predicted values by the other agent during the simulation.
  • ...and 2 more figures

Theorems & Definitions (6)

  • theorem 1
  • proof : See \ref{['Appendix.A']} for the Full Proof
  • proof
  • proof
  • proof
  • proof