Table of Contents
Fetching ...

RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms

Yitian Kou, Yihe Gu, Chen Zhou, DanDan Zhu, Shuguang Kuai

TL;DR

RLSLM addresses socially aware navigation by integrating a psychology-derived social discomfort field into a reinforcement learning reward. The method formulates a multi-objective RL objective with $G=\sum_t \gamma r_t$ and $r_t=R_d(s_t,s_{t-1}) + R_e(s_t) + \sigma R_s(s_t)$, where $R_e(s_t)=-\alpha$, $R_d(s_t,s_{t-1})=(D_{t-1}-D_t)/l$ and $R_s(s_t)$ aggregates a three-component social influence field (HRSC, HISC, CAC). A VR-based evaluation demonstrates that RLSLM achieves higher comfort ratings than rule-based baselines and ablation shows improved interpretability relative to purely data-driven methods. The work offers a scalable, human-centered framework that fuses cognitive science with reinforcement learning for practical social navigation.

Abstract

Navigating human-populated environments without causing discomfort is a critical capability for socially-aware agents. While rule-based approaches offer interpretability through predefined psychological principles, they often lack generalizability and flexibility. Conversely, data-driven methods can learn complex behaviors from large-scale datasets, but are typically inefficient, opaque, and difficult to align with human intuitions. To bridge this gap, we propose RLSLM, a hybrid Reinforcement Learning framework that integrates a rule-based Social Locomotion Model, grounded in empirical behavioral experiments, into the reward function of a reinforcement learning framework. The social locomotion model generates an orientation-sensitive social comfort field that quantifies human comfort across space, enabling socially aligned navigation policies with minimal training. RLSLM then jointly optimizes mechanical energy and social comfort, allowing agents to avoid intrusions into personal or group space. A human-agent interaction experiment using an immersive VR-based setup demonstrates that RLSLM outperforms state-of-the-art rule-based models in user experience. Ablation and sensitivity analyses further show the model's significantly improved interpretability over conventional data-driven methods. This work presents a scalable, human-centered methodology that effectively integrates cognitive science and machine learning for real-world social navigation.

RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms

TL;DR

RLSLM addresses socially aware navigation by integrating a psychology-derived social discomfort field into a reinforcement learning reward. The method formulates a multi-objective RL objective with and , where , and aggregates a three-component social influence field (HRSC, HISC, CAC). A VR-based evaluation demonstrates that RLSLM achieves higher comfort ratings than rule-based baselines and ablation shows improved interpretability relative to purely data-driven methods. The work offers a scalable, human-centered framework that fuses cognitive science with reinforcement learning for practical social navigation.

Abstract

Navigating human-populated environments without causing discomfort is a critical capability for socially-aware agents. While rule-based approaches offer interpretability through predefined psychological principles, they often lack generalizability and flexibility. Conversely, data-driven methods can learn complex behaviors from large-scale datasets, but are typically inefficient, opaque, and difficult to align with human intuitions. To bridge this gap, we propose RLSLM, a hybrid Reinforcement Learning framework that integrates a rule-based Social Locomotion Model, grounded in empirical behavioral experiments, into the reward function of a reinforcement learning framework. The social locomotion model generates an orientation-sensitive social comfort field that quantifies human comfort across space, enabling socially aligned navigation policies with minimal training. RLSLM then jointly optimizes mechanical energy and social comfort, allowing agents to avoid intrusions into personal or group space. A human-agent interaction experiment using an immersive VR-based setup demonstrates that RLSLM outperforms state-of-the-art rule-based models in user experience. Ablation and sensitivity analyses further show the model's significantly improved interpretability over conventional data-driven methods. This work presents a scalable, human-centered methodology that effectively integrates cognitive science and machine learning for real-world social navigation.

Paper Structure

This paper contains 17 sections, 10 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Methodology overview. The hybrid model of RLSLM combines the top-down, rule-based approach which develops computational models of human social behaviors from well-controlled lab experiments and bottom-up data-driven approach which formulates the reinforcement learning framework based on large-scale dataset of real-world social scenarios. The hybrid model first encodes human behavioral patterns into a social reward function, which is then used to train the policy within a reinforcement learning framework. The trained model is subsequently validated through human-agent interaction studies and simulations.
  • Figure 2: Overview of RLSLM framework. RLSLM integrates social influence modeling with reinforcement learning to guide an agent’s movement in environments shared with humans. The framework follows a three-stage decision-making loop (gray arrow), and once the environment is updated based on the agent's action, the cycle begins again with a new observation.
  • Figure 3: Overview of our VR-based user evaluation pipeline. (1) For each scenario, we import a set of human layouts (positions and orientations) and corresponding navigation trajectories generated by different models. (2) The participant views these simulated interactions in an immersive first-person VR environment, observing the agent’s movement among virtual humans. (3) After each trial, the participant provides a comfort rating (1–5), which is recorded and aggregated across models for quantitative comparison.
  • Figure 4: Comfort Rating Analysis via VR-Based User Study. (a) illustrates the VR experiment setup, where participants rate their comfort level (1–5) in both single- and multi-human interaction scenarios. (b) and (c) shows the trajectories of each model (RLSLM, $n$-Body, and COMPANION) from a top-down view. We selected two representative cases from both scenarios for presentation; the complete results are provided in the supplementary material. (d) presents the comfort rating distributions for each model under both scenarios, comparing the average comfort ratings of three models across both single- and multi-human interaction scenarios.
  • Figure 5: Model validation and ablation analysis. (a–c) Definitions and experimental setup. (d–e) Effects of varying the social behavior weight $\sigma$: (d) shows trajectory examples and MLD distributions under different $\sigma$ values; (e) reports the corresponding average MLD statistics. (f–g) Statistical results from ablation studies of the heading-relevant (f) and heading-irrelevant (g) components of the social influence model. Full experimental details are provided in the appendix.
  • ...and 5 more figures