Table of Contents
Fetching ...

Identifying Time-varying Costs in Finite-horizon Linear Quadratic Gaussian Games

Kai Ren, Maryam Kamgarpour

TL;DR

This work tackles identifying time-varying costs in finite-horizon linear-quadratic Gaussian games from observed Nash policies or trajectories. It derives a backward-propagating, null-space characterization that places each $\theta_t^i$ in the null space of $M_t^i$ and presents a constrained least-squares backpropagation algorithm to recover $\{Q_t^i, l_t^i, R_t^i\}$. It further provides finite-sample probabilistic bounds on the error in $\theta_t^i$ when the Nash policy is estimated from demonstrations, accounting for active-set stability. The approach is validated on numerical and driving simulations, demonstrating accurate reconstruction of policies and trajectories given sufficient demonstrations and highlighting data requirements. Overall, the method enables prediction and planning for multi-agent interactions under time-varying objectives in robotics and autonomous systems.

Abstract

We address cost identification in a finite-horizon linear quadratic Gaussian game. We characterize the set of cost parameters that generate a given Nash equilibrium policy. We propose a backpropagation algorithm to identify the time-varying cost parameters. We derive a probabilistic error bound when the cost parameters are identified from finite trajectories. We test our method in numerical and driving simulations. Our algorithm identifies the cost parameters that can reproduce the Nash equilibrium policy and trajectory observations.

Identifying Time-varying Costs in Finite-horizon Linear Quadratic Gaussian Games

TL;DR

This work tackles identifying time-varying costs in finite-horizon linear-quadratic Gaussian games from observed Nash policies or trajectories. It derives a backward-propagating, null-space characterization that places each in the null space of and presents a constrained least-squares backpropagation algorithm to recover . It further provides finite-sample probabilistic bounds on the error in when the Nash policy is estimated from demonstrations, accounting for active-set stability. The approach is validated on numerical and driving simulations, demonstrating accurate reconstruction of policies and trajectories given sufficient demonstrations and highlighting data requirements. Overall, the method enables prediction and planning for multi-agent interactions under time-varying objectives in robotics and autonomous systems.

Abstract

We address cost identification in a finite-horizon linear quadratic Gaussian game. We characterize the set of cost parameters that generate a given Nash equilibrium policy. We propose a backpropagation algorithm to identify the time-varying cost parameters. We derive a probabilistic error bound when the cost parameters are identified from finite trajectories. We test our method in numerical and driving simulations. Our algorithm identifies the cost parameters that can reproduce the Nash equilibrium policy and trajectory observations.

Paper Structure

This paper contains 15 sections, 7 theorems, 42 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Under assumption assump:CLinverse, for a given Nash equilibrium policy $\{K^{i*}_{t}, \alpha^{i*}_{t}\}_{t \in [T]^-}^{i \in [N]}$, the cost parameter of player $i$ at time $t$ is characterized by The terms $\bar{\Delta}_{t}^i$ and $\bar{\Omega}_{t}^i$ are defined recursively backward. At $t = T$, set $\bar{\Delta}_{T}^i = 0_{n_x^2}, \; \bar{\Omega}_{T}^i = 0_{n_x},$ from which $M_T^i$ is determi

Figures (4)

  • Figure 1: Demonstration trajectories of three cars driving in a cross intersection. The trajectories are generated by the same Nash equilibrium policy but different system noise realizations.
  • Figure 2: Optimization residual and the deviations between the recovered (from the identified costs) and true policy, state, and input trajectories across 100 randomized cost matrices.
  • Figure 3: Five exemplary episodes for section \ref{['sec:num']}, where we compare the trajectories recovered from the identified cost parameters with the ground-truth. The recovered trajectories closely follow the ground-truth in all episodes.
  • Figure 4: Expected trajectories generated by the true costs and the identified costs from the exact Nash policy, 100 demonstrations and 20 demonstrations, respectively. The trajectories recovered from the exact policy or 100 demonstrations are close to the ground-truth. With 20 demonstrations, the recovered trajectories become inaccurate.

Theorems & Definitions (13)

  • Definition 1
  • Proposition 1
  • Lemma 1: krikheli2018finitesampleperformancelinear
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 3 more