Table of Contents
Fetching ...

Grounded Relational Inference: Domain Knowledge Driven Explainable Autonomous Driving

Chen Tang, Nishan Srishankar, Sujitha Martin, Masayoshi Tomizuka

TL;DR

This work tackles the explainability of autonomous driving by grounding a relational latent space in domain-knowledge defined interactions. Grounded Relational Inference (GRI) combines a variational GNN with structured reward functions and adversarial IRL to produce semantically meaningful interaction graphs that explain a vehicle's actions. Empirical results on synthetic and real traffic datasets show that GRI yields graph-accurate, interpretable edge types aligned with human domain knowledge, albeit with some trade-offs in trajectory reconstruction. The approach offers a principled pathway to explainable, human-aligned autonomous driving behaviors and opens avenues for safer human-machine collaboration and scalable, domain-informed modeling of multi-agent traffic.

Abstract

Explainability is essential for autonomous vehicles and other robotics systems interacting with humans and other objects during operation. Humans need to understand and anticipate the actions taken by the machines for trustful and safe cooperation. In this work, we aim to develop an explainable model that generates explanations consistent with both human domain knowledge and the model's inherent causal relation. In particular, we focus on an essential building block of autonomous driving, multi-agent interaction modeling. We propose Grounded Relational Inference (GRI). It models an interactive system's underlying dynamics by inferring an interaction graph representing the agents' relations. We ensure a semantically meaningful interaction graph by grounding the relational latent space into semantic interactive behaviors defined with expert domain knowledge. We demonstrate that it can model interactive traffic scenarios under both simulation and real-world settings, and generate semantic graphs explaining the vehicle's behavior by their interactions.

Grounded Relational Inference: Domain Knowledge Driven Explainable Autonomous Driving

TL;DR

This work tackles the explainability of autonomous driving by grounding a relational latent space in domain-knowledge defined interactions. Grounded Relational Inference (GRI) combines a variational GNN with structured reward functions and adversarial IRL to produce semantically meaningful interaction graphs that explain a vehicle's actions. Empirical results on synthetic and real traffic datasets show that GRI yields graph-accurate, interpretable edge types aligned with human domain knowledge, albeit with some trade-offs in trajectory reconstruction. The approach offers a principled pathway to explainable, human-aligned autonomous driving behaviors and opens avenues for safer human-machine collaboration and scalable, domain-informed modeling of multi-agent traffic.

Abstract

Explainability is essential for autonomous vehicles and other robotics systems interacting with humans and other objects during operation. Humans need to understand and anticipate the actions taken by the machines for trustful and safe cooperation. In this work, we aim to develop an explainable model that generates explanations consistent with both human domain knowledge and the model's inherent causal relation. In particular, we focus on an essential building block of autonomous driving, multi-agent interaction modeling. We propose Grounded Relational Inference (GRI). It models an interactive system's underlying dynamics by inferring an interaction graph representing the agents' relations. We ensure a semantically meaningful interaction graph by grounding the relational latent space into semantic interactive behaviors defined with expert domain knowledge. We demonstrate that it can model interactive traffic scenarios under both simulation and real-world settings, and generate semantic graphs explaining the vehicle's behavior by their interactions.

Paper Structure

This paper contains 24 sections, 33 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: A motivating lane-changing scenario where we ask different models to control the red vehicle. All the models generate deceleration commands but have different intermediate outputs. With the aid of visual attention, we generate a heat map indicating the critical pixels of the input image. Graph attention network assigns edge weights $\omega_i$ to specify the importance of surrounding vehicles to the controlled vehicle. However, the attention mechanisms cannot recognize different effects— the two cars are mutually important but affect each other in distinct ways. The NRI model can distinguish between different interactive behaviors by assigning different values to the latent variables $z_i$ in the interaction graph. Still, the latent space does not have explicit semantic meaning. In contrast, our model ensures a semantic interaction graph, which illustrates the model's understanding of the scenario and explains the action it takes. It determines the interaction graph with a latent space grounded in yielding and cutting-in behaviors. It learns the control policies that generate behaviors consistent with their definitions in domain knowledge (e.g., traffic rules) and executes the corresponding policies according to the inferred edge types.
  • Figure 2: Architecture of grounded relational inference model. Given a demonstration trajectory $\boldsymbol{\tau}^\mathrm{E}\in\mathcal{D}^\mathrm{E}$, the encoder operates over $\mathcal{G}_\mathrm{scene}$ and approximates the distribution $p(\mathbf{z}\vert\boldsymbol{\tau}^\mathrm{E})$ with $q_\phi(\mathbf{z}\vert\boldsymbol{\tau}^\mathrm{E})$. The policy decoder operates over a $\mathcal{G}_\mathrm{interact}$ sampled from the inferred $q_\phi(\mathbf{z}\vert\boldsymbol{\tau}^\mathrm{E})$ and models the policy $\boldsymbol{\pi}_\eta \left(\mathbf{a}^t\vert{\mathbf{x}^t, \mathbf{z}}\right)$. Given the initial state of $\boldsymbol{\tau}^\mathrm{E}$, we sample a trajectory $\boldsymbol{\tau}^\mathrm{G}$ by sequentially sampling $\mathbf{a}^t$ from $\boldsymbol{\pi}_\eta \left(\mathbf{a}^t\vert{\mathbf{x}^t, \mathbf{z}}\right)$ and propagating the state. Finally, We use the reward GNN to compute the cumulative rewards of $\boldsymbol{\tau}^\mathrm{G}$ and $\boldsymbol{\tau}^\mathrm{E}$ conditioned on the sampled $\mathcal{G}_\mathrm{interact}$.
  • Figure 3: Test scenarios with the underlying interaction graphs. In the synthetic scenarios, the graphs are the ground-truth ones governing the synthetic experts. In the naturalistic traffic scenarios, the graphs are human hypotheses reflecting humans' understanding of the traffic scenarios.
  • Figure 4: The empirical distribution of estimated edge variables $\hat{z}$ over the test dataset in the synthetic scenarios. We summarize the results in multiple adjacency matrices corresponding to different edge types. In the adjacency matrix corresponding to the $k^\mathrm{th}$ type of interaction, the element $A_{i,j}$ indicates the relative frequency of $\hat{z}_{j,i}=k$, where $\hat{z}_{j,i}$ is the latent variable for the edge from node $j$ to node $i$.
  • Figure 5: Collision point diagram. At every timestep, the heading vector of the agents' can be calculated approximating the motion as linear. The intersection between these vectors is taken to be the collision point where the agents would collide if a yield action is not taken.
  • ...and 10 more figures