Table of Contents
Fetching ...

Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations

Jingqi Li, Anand Siththaranjan, Somayeh Sojoudi, Claire Tomlin, Andrea Bajcsy

TL;DR

This work addresses coordinating $N$ agents in general-sum dynamic games under incomplete information by enabling a certain agent to strategically demonstrate its intent to uncertain opponents. The authors develop an algorithm based on iterative linear-quadratic approximations that alternates between solving complete-information LQ games for a set of candidate intents and optimizing the certain agent’s joint physical-estimate trajectory, with convergence guarantees on belief alignment and potential improvements in task performance. They extend the framework to nonlinear dynamics via iLQG/iLQR and nonlinear belief updates, including Bayesian updates, and discuss potential integrations with deep reinforcement learning. Empirical validation across four multi-agent tasks demonstrates faster belief learning, reduced regret for the certain agent, and robust task performance when intent demonstration is strategically employed, highlighting practical benefits for autonomous driving, multi-robot collaboration, and shared-control systems.

Abstract

Autonomous agents should coordinate effectively without prior knowledge of others' intents. While prior work has focused on intent inference, we address the inverse problem: how agents can strategically demonstrate their intents within general-sum dynamic games. We model this problem and propose an algorithm that balances intent demonstration with task performance. To handle nonlinear dynamic games with continuous state-action spaces, our method leverages iterative linear-quadratic game approximations and provides efficient intent-teaching guarantees: the uncertain agent's belief can be driven rapidly to the ground truth, while the demonstrating agent avoids expending effort on unnecessary belief alignment when it does not improve task performance. Theoretical analysis and hardware experiments confirm that our approach enables the demonstrating agent to reconcile task execution with belief alignment and strategically manage the information asymmetry among agents, even as its intent evolves during deployment.

Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations

TL;DR

This work addresses coordinating agents in general-sum dynamic games under incomplete information by enabling a certain agent to strategically demonstrate its intent to uncertain opponents. The authors develop an algorithm based on iterative linear-quadratic approximations that alternates between solving complete-information LQ games for a set of candidate intents and optimizing the certain agent’s joint physical-estimate trajectory, with convergence guarantees on belief alignment and potential improvements in task performance. They extend the framework to nonlinear dynamics via iLQG/iLQR and nonlinear belief updates, including Bayesian updates, and discuss potential integrations with deep reinforcement learning. Empirical validation across four multi-agent tasks demonstrates faster belief learning, reduced regret for the certain agent, and robust task performance when intent demonstration is strategically employed, highlighting practical benefits for autonomous driving, multi-robot collaboration, and shared-control systems.

Abstract

Autonomous agents should coordinate effectively without prior knowledge of others' intents. While prior work has focused on intent inference, we address the inverse problem: how agents can strategically demonstrate their intents within general-sum dynamic games. We model this problem and propose an algorithm that balances intent demonstration with task performance. To handle nonlinear dynamic games with continuous state-action spaces, our method leverages iterative linear-quadratic game approximations and provides efficient intent-teaching guarantees: the uncertain agent's belief can be driven rapidly to the ground truth, while the demonstrating agent avoids expending effort on unnecessary belief alignment when it does not improve task performance. Theoretical analysis and hardware experiments confirm that our approach enables the demonstrating agent to reconcile task execution with belief alignment and strategically manage the information asymmetry among agents, even as its intent evolves during deployment.
Paper Structure (10 sections, 2 theorems, 8 equations, 6 figures, 1 algorithm)

This paper contains 10 sections, 2 theorems, 8 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

Consider a two-player LQ game. Suppose that the linear policy $\pi_t^1(x_t;\theta)$ takes the form $\pi_t^1(x_t;\theta) = K_{t,x}^1 x_t + K_{t,\theta}^1 \theta ,\ \forall t\in \mathbf{T}$ and $K_{t,\theta}^{1\top} K_{t,\theta}^1>0$. Moreover, let player $j\in\{2,\dots,N\}$ learn via linear estimat

Figures (6)

  • Figure 1: Intent Demonstration Problem in General-Sum Games. The certain player A optimizes $u_t^A=\bar{\pi}_t^A(x_t, \hat{\theta}_t ;\theta^*)$, which trades off its own task cost and demonstrating their intent. The uncertain player B engages with player A through rational actions $u_t^B =\pi_t^B(x_t;\hat{\theta}_t)$ and updates their estimate $\hat{\theta}_t$ of player $A$'s intent $\theta^*$ by observing $A$'s actions. This enables player $A$ to choose to influence player $B$'s estimate.
  • Figure 2: Environments. Four incomplete information general-sum games considered in this work.
  • Figure 3: Results: H1. Algorithm 1 accelerates learning by having the certain agent exaggerate its behavior, helping the uncertain agent infer its intent.
  • Figure 4: Results: H2. The human pilot changes their target landing position $\theta^*$ from 25 to 50 at time $t=20$. The strategic intent demonstration policy $\bar{\pi}_t^1$, computed without anticipating this change, efficiently conveys the unforeseen dynamic intent, enabling the autopilot’s belief to converge faster than in the passive game, without the need of recomputing $\bar{\pi}_t^1$.
  • Figure 5: Results: H3. The regrets of the certain player (player 1) under the active teaching strategy are consistently lower compared to those under the passive teaching strategy, across different ground truth intents of the certain player. This empirically validates the claim in Proposition \ref{['prop:strategic teaching']}.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Proposition 1: Effective Intent Demonstration
  • proof
  • Proposition 2: Strategic Intent Demonstration
  • proof
  • Remark 3