Table of Contents
Fetching ...

Following the Human Thread in Social Navigation

Luca Scofano, Alessio Sampieri, Tommaso Campari, Valentino Sacco, Indro Spinelli, Lamberto Ballan, Fabio Galasso

TL;DR

This work tackles social navigation under partial observability by introducing the Social Dynamics Adaptation (SDA) model, a two-stage RL framework that first learns a policy conditioned on privileged human trajectories and then learns to infer the same social dynamics online from the robot's own state-action history. The trajectory encoder μ captures the social cues from human motion, while the Adapter ψ regresses the latent dynamics $\,\hat{z}_t$ using only accessible robot histories, enabling deployment without privileged data. Evaluated on Habitat 3.0, SDA achieves state-of-the-art performance in finding and following humans and demonstrates robustness to noise and reduced sensor update rates; ablations show the value of privileged information during training and the effectiveness of online inference. The approach advances real-time human-robot collaboration by bridging simulated privileged signals with deployable, sensor-based social understanding, and it sets a foundation for extending to more diverse human dynamics and multi-human scenarios. Future work will broaden the social cues modeled and explore real-world deployment, including sim-to-real transfer strategies.

Abstract

The success of collaboration between humans and robots in shared environments relies on the robot's real-time adaptation to human motion. Specifically, in Social Navigation, the agent should be close enough to assist but ready to back up to let the human move freely, avoiding collisions. Human trajectories emerge as crucial cues in Social Navigation, but they are partially observable from the robot's egocentric view and computationally complex to process. We present the first Social Dynamics Adaptation model (SDA) based on the robot's state-action history to infer the social dynamics. We propose a two-stage Reinforcement Learning framework: the first learns to encode the human trajectories into social dynamics and learns a motion policy conditioned on this encoded information, the current status, and the previous action. Here, the trajectories are fully visible, i.e., assumed as privileged information. In the second stage, the trained policy operates without direct access to trajectories. Instead, the model infers the social dynamics solely from the history of previous actions and statuses in real-time. Tested on the novel Habitat 3.0 platform, SDA sets a novel state-of-the-art (SotA) performance in finding and following humans. The code can be found at https://github.com/L-Scofano/SDA.

Following the Human Thread in Social Navigation

TL;DR

This work tackles social navigation under partial observability by introducing the Social Dynamics Adaptation (SDA) model, a two-stage RL framework that first learns a policy conditioned on privileged human trajectories and then learns to infer the same social dynamics online from the robot's own state-action history. The trajectory encoder μ captures the social cues from human motion, while the Adapter ψ regresses the latent dynamics using only accessible robot histories, enabling deployment without privileged data. Evaluated on Habitat 3.0, SDA achieves state-of-the-art performance in finding and following humans and demonstrates robustness to noise and reduced sensor update rates; ablations show the value of privileged information during training and the effectiveness of online inference. The approach advances real-time human-robot collaboration by bridging simulated privileged signals with deployable, sensor-based social understanding, and it sets a foundation for extending to more diverse human dynamics and multi-human scenarios. Future work will broaden the social cues modeled and explore real-world deployment, including sim-to-real transfer strategies.

Abstract

The success of collaboration between humans and robots in shared environments relies on the robot's real-time adaptation to human motion. Specifically, in Social Navigation, the agent should be close enough to assist but ready to back up to let the human move freely, avoiding collisions. Human trajectories emerge as crucial cues in Social Navigation, but they are partially observable from the robot's egocentric view and computationally complex to process. We present the first Social Dynamics Adaptation model (SDA) based on the robot's state-action history to infer the social dynamics. We propose a two-stage Reinforcement Learning framework: the first learns to encode the human trajectories into social dynamics and learns a motion policy conditioned on this encoded information, the current status, and the previous action. Here, the trajectories are fully visible, i.e., assumed as privileged information. In the second stage, the trained policy operates without direct access to trajectories. Instead, the model infers the social dynamics solely from the history of previous actions and statuses in real-time. Tested on the novel Habitat 3.0 platform, SDA sets a novel state-of-the-art (SotA) performance in finding and following humans. The code can be found at https://github.com/L-Scofano/SDA.
Paper Structure (36 sections, 10 equations, 7 figures, 11 tables)

This paper contains 36 sections, 10 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: We present our novel Social Dynamic Adaptation model (SDA). The framework involves two stages of training that allow the model to infer, given its past observations and actions, another agent's Social Dynamics. In the first training stage, the model embeds the followed agent's trajectory, which, together with sensor perceptions, compose the input to the model's Social navigation Policy ($\pi$). The knowledge obtained from the human trajectory strongly helps the navigation policy in finding and following an agent. However, this information is often not available during deployment. In the second stage, SDA learns to adapt past statuses and actions, which are always available, to the first stage's Social Dynamics embedding $\hat{z}$. As depicted in the figure, the status contains depth maps and BB detection of the person, if observable from the egocentric robot view. $\hat{z}$ is then paired with current observations as input to the frozen $\pi$.
  • Figure 2: Pipeline of the novel methodology proposed. First, we jointly learn to encode human trajectories and a motion policy. In the next stage, given the previous states and actions, we infer the social dynamics and pass the estimated latent vector to the frozen policy.
  • Figure 3: The agent and the humanoid start the episode in separate rooms. The agent navigates through the environment in search of the humanoid, and once found, begins to follow it.
  • Figure 4: Latent Analysis
  • Figure 5: Failure Cases Analysis on 100 episodes
  • ...and 2 more figures