Table of Contents
Fetching ...

Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave

Sagar Parekh, Lauren Bramblett, Nicola Bezzo, Dylan P. Losey

TL;DR

The paper tackles the problem that humans predict robot behavior by relying on high-level patterns rather than precise actions. It proposes a second-order Theory of Mind framework implemented as a discrete latent autoencoder with finite scalar quantization, mapping joint trajectories $\xi$ to a latent set $z \in \mathcal{Z}$ with $L^d$ possibilities and decoding to a vector field that forecasts human-perceived robot actions. Key contributions include formalizing the second-order ToM setting, extracting human-friendly high-level behaviors via a discrete latent space, and validating the approach through synthetic tests, a user study, and a real-world driving dataset, where it outperforms a VAE baseline in aligning with human predictions. The approach offers interpretable, high-level predictions that can inform robot planning to improve safety and collaboration in shared spaces, particularly for driving scenarios.

Abstract

Humans interacting with robots often form predictions of what the robot will do next. For instance, based on the recent behavior of an autonomous car, a nearby human driver might predict that the car is going to remain in the same lane. It is important for the robot to understand the human's prediction for safe and seamless interaction: e.g., if the autonomous car knows the human thinks it is not merging -- but the autonomous car actually intends to merge -- then the car can adjust its behavior to prevent an accident. Prior works typically assume that humans make precise predictions of robot behavior. However, recent research on human-human prediction suggests the opposite: humans tend to approximate other agents by predicting their high-level behaviors. We apply this finding to develop a second-order theory of mind approach that enables robots to estimate how humans predict they will behave. To extract these high-level predictions directly from data, we embed the recent human and robot trajectories into a discrete latent space. Each element of this latent space captures a different type of behavior (e.g., merging in front of the human, remaining in the same lane) and decodes into a vector field across the state space that is consistent with the underlying behavior type. We hypothesize that our resulting high-level and course predictions of robot behavior will correspond to actual human predictions. We provide initial evidence in support of this hypothesis through proof-of-concept simulations, testing our method's predictions against those of real users, and experiments on a real-world interactive driving dataset.

Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave

TL;DR

The paper tackles the problem that humans predict robot behavior by relying on high-level patterns rather than precise actions. It proposes a second-order Theory of Mind framework implemented as a discrete latent autoencoder with finite scalar quantization, mapping joint trajectories to a latent set with possibilities and decoding to a vector field that forecasts human-perceived robot actions. Key contributions include formalizing the second-order ToM setting, extracting human-friendly high-level behaviors via a discrete latent space, and validating the approach through synthetic tests, a user study, and a real-world driving dataset, where it outperforms a VAE baseline in aligning with human predictions. The approach offers interpretable, high-level predictions that can inform robot planning to improve safety and collaboration in shared spaces, particularly for driving scenarios.

Abstract

Humans interacting with robots often form predictions of what the robot will do next. For instance, based on the recent behavior of an autonomous car, a nearby human driver might predict that the car is going to remain in the same lane. It is important for the robot to understand the human's prediction for safe and seamless interaction: e.g., if the autonomous car knows the human thinks it is not merging -- but the autonomous car actually intends to merge -- then the car can adjust its behavior to prevent an accident. Prior works typically assume that humans make precise predictions of robot behavior. However, recent research on human-human prediction suggests the opposite: humans tend to approximate other agents by predicting their high-level behaviors. We apply this finding to develop a second-order theory of mind approach that enables robots to estimate how humans predict they will behave. To extract these high-level predictions directly from data, we embed the recent human and robot trajectories into a discrete latent space. Each element of this latent space captures a different type of behavior (e.g., merging in front of the human, remaining in the same lane) and decodes into a vector field across the state space that is consistent with the underlying behavior type. We hypothesize that our resulting high-level and course predictions of robot behavior will correspond to actual human predictions. We provide initial evidence in support of this hypothesis through proof-of-concept simulations, testing our method's predictions against those of real users, and experiments on a real-world interactive driving dataset.
Paper Structure (8 sections, 4 equations, 5 figures)

This paper contains 8 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: When robots model the behaviors of other robots, they often develop precise predictions. But when humans try to predict the behaviors of robots, they are generally not precise. Instead, recent work suggests that humans focus on the high-level pattern in robot behavior (i.e., merging), and then make coarse predictions consistent with that pattern. In this paper we develop a data-driven approach that captures this high-level reasoning, resulting in more accurate estimates of human predictions.
  • Figure 2: Our proposed model for learning to estimate how humans predict a robot will behave. We develop an autoencoder with three parts: (a) an encoder $\phi$ that inputs trajectories and outputs discrete latent values, (b) a discrete latent space $z$ where each different value corresponds to a different high-level behavior, and (c) a decoder $\psi$ that takes the current state and high-level pattern and predicts the robot's next $n$ actions. During training, the decoder additionally reconstructs action sequences $a_1^{T-k+i}$ for a state $s^{T-k}$ from the input trajectory in order to learn the discrete latent space.
  • Figure 3: Human predictions of robot behavior extracted by our method. The human is shown in gray, and the robot is shown in green. Our approach first embeds the current interaction into a discrete representation $z \in \mathcal{Z}$. We then fix that value of $z$, and extract the actions we think the human will predict across the workspace. This results in a vector field that visualizes the high-level pattern $z$. (Top) In the Highway environment, our method autonomously extracts three high-level behavior patterns of the robot: merging into the right lane, staying straight, and merging into the left lane. (Bottom) In Obstacle, our method identifies the goal-reaching movement patterns of the robot, where each different $z$ predicts actions that move towards a specific goal. These results suggest that our data-driven approach is able to learn high-level behaviors that align with human explanations (e.g., merging, going to the left goal).
  • Figure 4: Results from our user study. Participants observe a segment of human-robot motion and predict how the robot will behave. We compare these real predictions to the predictions made by our method and the baseline. (Top-left) The plot shows the mean alignment error of both methods. The error value can range from $0$ (indicating parallel prediction) to $2$ (indicating opposite predictions). Our method achieves a significantly lower error than the baseline in both environments, Highway and Obstacle. Next, we compare the vector field of the participants' predictions with the vector field decoded from the most common latent vectors. (Top-right) In Highway, we obtain two prominent movement patterns: the robot merging left, and the robot merging right. The predictions made by our model (orange) align with the participants' predictions. (Bottom) In Obstacle, we obtain three notable goal-reaching behaviors. The behaviors decoded from each latent clearly indicates the robot's goal, and this appears to align with the participant's model of the robot.
  • Figure 5: Results of our experiments on the INTERACT dataset interactiondataset. We observe the trajectory of two cars, and predict the movement of one vehicle from the perspective of the other car. We compare the predictions of our high-level method and the precise baseline. (Left) The average alignment error of the predictions across $10$ trials. The left plot shows the results for the Roundabout scenario and the right shows the results for the Intersection scenario. The error value ranges from 0 (indicating parallel prediction) to 2 (indicating opposite prediction). Our method achieves a significantly lower error in both scenarios ($p < 0.001$). Next, we visualize the movement patterns identified by the latent vectors of our model. (Top-right) In Roundabout we obtain three prominent movement patterns: entering the roundabout from the left and going up, going around the roundabout, and entering the roundabout from the bottom. (Bottom-right) In Intersection our model identifies three prominent movement patterns: going straight, going left, and going right.