Learning Early Social Maneuvers for Enhanced Social Navigation

Yigit Yildirim; Mehmet Suzer; Emre Ugur

Learning Early Social Maneuvers for Enhanced Social Navigation

Yigit Yildirim, Mehmet Suzer, Emre Ugur

TL;DR

This work addresses the challenge of socially compliant mobile navigation by proposing a purely data-driven Learning from Demonstration framework that relies on raw sensory data and explicitly accounts for future pedestrian trajectories. The method combines a Conditional Neural Process–based LfD module with a traditional local planner (e.g., Dynamic Window Approach) and introduces a CNN-based state encoder to incorporate full environmental context, including 360° LiDAR data. It integrates a pedestrian trajectory forecaster (LSTM-based RL) to predict future paths and feed these into the planning module, enabling anticipatory maneuvers. While current results are demonstrated in synthetic environments and component-level evaluations, the authors outline a path toward real-world validation, offline and online integration, and assessments of social trust and acceptance, with plans for multimodal extensions.

Abstract

Socially compliant navigation is an integral part of safety features in Human-Robot Interaction. Traditional approaches to mobile navigation prioritize physical aspects, such as efficiency, but social behaviors gain traction as robots appear more in daily life. Recent techniques to improve the social compliance of navigation often rely on predefined features or reward functions, introducing assumptions about social human behavior. To address this limitation, we propose a novel Learning from Demonstration (LfD) framework for social navigation that exclusively utilizes raw sensory data. Additionally, the proposed system contains mechanisms to consider the future paths of the surrounding pedestrians, acknowledging the temporal aspect of the problem. The final product is expected to reduce the anxiety of people sharing their environment with a mobile robot, helping them trust that the robot is aware of their presence and will not harm them. As the framework is currently being developed, we outline its components, present experimental results, and discuss future work towards realizing this framework.

Learning Early Social Maneuvers for Enhanced Social Navigation

TL;DR

Abstract

Paper Structure (9 sections, 1 equation, 7 figures)

This paper contains 9 sections, 1 equation, 7 figures.

Introduction
Related Work
Method
Incorporating the Future Predictions
Improving the Environmental Awareness
Experiments and Results
The Incorporation of Future Trajectories
The use of CNN as a state encoder
Conclusion and Future Work

Figures (7)

Figure 1: A socially compliant navigation path. Even though the path is suboptimal regarding time and energy, it is socially more acceptable.
Figure 2: The local planner module used in yildirim2021learning. In this model, the neural network is fed information about the robot's destination and the position of the closest pedestrian, and the appropriate velocity commands are output.
Figure 3: An environment with three pedestrians, a robot, and an obstacle. The nonsocial behavior is illustrated with the red path. Even though $P_1$ and $P_2$ are closer to the robot, the pedestrian whose trajectory affects the robot is $P_3$, since only $P_3$ presents the chance of encounter. Therefore, the robot should follow the green path to avoid future encounters.
Figure 4: To include predicted future trajectories in the planning mechanism, the proposed approach is to feed them to the CNP-based LfD architecture. The neural network takes in tuples of the timestep, current position, navigation goal, the predicted trajectory of the pedestrian, and the robot's trajectory within the next few time steps. It outputs the predicted trajectory in a local window, allowing it to be converted into a local planner.
Figure 5: The use of a CNN as state encoder. The environment is presented as a 2D image where red bars signify the randomly placed obstacles, and the blue trajectory shows the expert demonstration. CNN processes the image and produces a fixed-size state encoding, which is concatenated to the Encoder's input. The Query produces predictions using this information.
...and 2 more figures

Learning Early Social Maneuvers for Enhanced Social Navigation

TL;DR

Abstract

Learning Early Social Maneuvers for Enhanced Social Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)