SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

Jianpeng Yao; Xiaopan Zhang; Yu Xia; Zejin Wang; Amit K. Roy-Chowdhury; Jiachen Li

SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li

TL;DR

SoNIC addresses safety-critical social navigation by fusing Adaptive Conformal Inference (ACI) with Constrained Reinforcement Learning (CRL). By augmenting observations with online uncertainty and guiding policy learning through a Lagrangian objective that penalizes pedestrian-buffer intrusions, it achieves state-of-the-art safety and social-norm adherence on CrowdNav, including strong robustness to distribution shifts. The approach includes a flexible prediction module (CV or GST), an attention-based policy network, and a spatial-relaxation CRL mechanism that converts collision-rate constraints into actionable, dense costs, improving convergence and safety. Real-robot experiments in ROS2 demonstrate robust, socially polite behavior in dense crowds, indicating practical viability for real-world deployment.

Abstract

Reinforcement learning (RL) enables social robots to generate trajectories without relying on human-designed rules or interventions, making it generally more effective than rule-based systems in adapting to complex, dynamic real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians, whereas existing RL-based solutions often fall short of ensuring safety in complex environments. In this paper, we propose SoNIC, which to the best of our knowledge is the first algorithm that integrates adaptive conformal inference (ACI) with constrained reinforcement learning (CRL) to enable safe policy learning for social navigation. Specifically, our method not only augments RL observations with ACI-generated nonconformity scores, which inform the agent of the quantified uncertainty but also employs these uncertainty estimates to effectively guide the behaviors of RL agents by using constrained reinforcement learning. This integration regulates the behaviors of RL agents and enables them to handle safety-critical situations. On the standard CrowdNav benchmark, our method achieves a success rate of 96.93%, which is 11.67% higher than the previous state-of-the-art RL method and results in 4.5 times fewer collisions and 2.8 times fewer intrusions to ground-truth human future trajectories as well as enhanced robustness in out-of-distribution scenarios. To further validate our approach, we deploy our algorithm on a real robot by developing a ROS2-based navigation system. Our experiments demonstrate that the system can generate robust and socially polite decision-making when interacting with both sparse and dense crowds. The video demos can be found on our project website: https://sonic-social-nav.github.io/.

SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

TL;DR

Abstract

Paper Structure (31 sections, 21 equations, 7 figures, 6 tables)

This paper contains 31 sections, 21 equations, 7 figures, 6 tables.

Introduction
Related Work
Social Robot Navigation
Planning Under Uncertainty
Safe Reinforcement Learning
Preliminaries
Adaptive Conformal Inference
Constrained Reinforcement Learning
Method
Problem Formulation
Method Overview
Rule-Based and Learning-Based Trajectory Prediction
ACI for Quantifying Prediction Uncertainty
Policy Network Structure
CRL with Spatial Relaxation
...and 16 more sections

Figures (7)

Figure 1: SoNIC employs ACI to generate a spatial buffer around human agents and guide the behaviors of CRL agents to avoid entering the buffer by constraining the cumulative intrusions over each episode.
Figure 2: The overall pipeline of SoNIC. We mark components related to humans in yellow, components related to physical information and decision-making of the robot in blue, and fused features in green. We use ACI to quantify the prediction uncertainty of human trajectories and concatenate these metrics with predictions before inputting them into networks. The networks contain attention mechanisms for interactions between humans (H-H attention) and between humans and the ego robot (H-R attention). Prediction uncertainty combined with physical information is used for designing costs. For the CRL agent using PPO Lagrangian, the actor and reward critic share some layers while the cost critic uses a separate network. We adopt reward value loss $l^R$, action loss $l^{\pi}$, and cost value loss $l^C$ for updating the agent. More details about the architectures and training strategy can be found in Section IV.
Figure 3: Visualization of test results for different cases. Pedestrians are shown in blue, the robot in yellow, and the goal is represented by the orange star. The spatial buffers based on uncertainty quantification are depicted as light blue circles around humans, while the subareas considered in CRL are a slightly deeper shade. (a) SoNIC (w/ GST) performing in an in-distribution environment, successfully navigating to the goal. (b) CrowdNav++ performing in the same episode but failing to complete the task. In this subfigure, the light blue circles indicate prediction lines rather than spatial buffers. (c) SoNIC (w/ GST) performing in an OOD environment with rushing humans. (d) SoNIC (w/ GST) performing in an OOD environment with the SF pedestrian model.
Figure 4: Convergence analysis of SoNIC. (a) The learning curves of SoNIC (w/ GST) and RL (w/ACI). SoNIC (w/ GST) shows faster convergence with higher rewards. (b) The cost curves of SoNIC (w/ GST) with different cost limits. The average costs across episodes can approximately approach the predefined cost limits, which are shown by the dashed lines.
Figure 5: We deploy our methods on a ROSMASTER X3 with Mecanum wheels using the ROS2 system. For the four subplots, the left sides display photos taken from the experiments, and the right sides show visualizations in RViz. In the RViz visualizations, the red circles represent detection results, and the white numbers inside the red circles indicate the output probabilities of the detection model. The purple numbers correspond to the indices generated by the tracker. The prediction lines are shown in blue, and the prediction uncertainties are depicted in semi-transparent light blue. In (a), the green robot represents the robot’s current location. In (b)-(d) where the decision node is enabled, the yellow sphere indicates the robot’s position and the yellow arrow represents the command output from the decider node. The orange circle indicates the goal position. (a) In the uncertainty visualization, the human initially walks around the robot and then stands still behind it. The uncertainty area adjusts dynamically based on the prediction accuracy. (b) The robot equipped with SoNIC demonstrates stable yielding behavior when interacting with humans. (c) In goal-reaching mode, the robot navigates through crowds and successfully reaches its goal. (d) In long-range navigation mode, the moving goal consistently guides the robot’s movement.
...and 2 more figures

SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

TL;DR

Abstract

SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)