Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

Jinyeob Kim; Sumin Kang; Sungwoo Yang; Beomjoon Kim; Jargalbaatar Yura; Donghan Kim

Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

Jinyeob Kim, Sumin Kang, Sungwoo Yang, Beomjoon Kim, Jargalbaatar Yura, Donghan Kim

TL;DR

Problem: reward design for socially-aware robot navigation in crowds is manual and brittle, hindering scalable learning. Approach: Transformable Gaussian Reward Function (TGRF) uses a Gaussian form with $w_{TGRF}$, $\mu_{TGRF}$, and $\sigma_{TGRF}$ and normalization $C_{norm}$ to flexibly shape penalties based on $x_{TGRF}$ (e.g., distance), reducing hyperparameter burden and accelerating DRL learning. Contributions: a low-hyperparameter, adaptable reward-shaping framework validated across multiple reward components and navigation policies, with faster learning and improved safety in crowded simulations and real-world tests. Findings: TGRF improves success rates up to around 95% and reduces intrusion in many settings, while real-world experiments reveal computation and sensor-noise challenges; the method shows strong practicality for socially-aware navigation but requires consideration of physical constraints. Significance: provides a scalable, adaptable reward-shaping tool for human-centric robotics, enabling safer, quicker policy learning in dynamic environments.

Abstract

Robot navigation has transitioned from prioritizing obstacle avoidance to adopting socially aware navigation strategies that accommodate human presence. As a result, the recognition of socially aware navigation within dynamic human-centric environments has gained prominence in the field of robotics. Although reinforcement learning technique has fostered the advancement of socially aware navigation, defining appropriate reward functions, especially in congested environments, has posed a significant challenge. These rewards, crucial in guiding robot actions, demand intricate human-crafted design due to their complex nature and inability to be automatically set. The multitude of manually designed rewards poses issues with hyperparameter redundancy, imbalance, and inadequate representation of unique object characteristics. To address these challenges, we introduce a transformable gaussian reward function (TGRF). The TGRF significantly reduces the burden of hyperparameter tuning, displays adaptability across various reward functions, and demonstrates accelerated learning rates, particularly excelling in crowded environments utilizing deep reinforcement learning (DRL). We introduce and validate TGRF through sections highlighting its conceptual background, characteristics, experiments, and real-world application, paving the way for a more effective and adaptable approach in robotics.The complete source code is available on https://github.com/JinnnK/TGRF

Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

TL;DR

, and

and normalization

to flexibly shape penalties based on

(e.g., distance), reducing hyperparameter burden and accelerating DRL learning. Contributions: a low-hyperparameter, adaptable reward-shaping framework validated across multiple reward components and navigation policies, with faster learning and improved safety in crowded simulations and real-world tests. Findings: TGRF improves success rates up to around 95% and reduces intrusion in many settings, while real-world experiments reveal computation and sensor-noise challenges; the method shows strong practicality for socially-aware navigation but requires consideration of physical constraints. Significance: provides a scalable, adaptable reward-shaping tool for human-centric robotics, enabling safer, quicker policy learning in dynamic environments.

Abstract

Paper Structure (18 sections, 7 equations, 9 figures, 2 tables)

This paper contains 18 sections, 7 equations, 9 figures, 2 tables.

Introduction
Related Works
Integration of Prior Knowledge through Human-Delivered Reward Functions
Reward Function Analysis for Human Avoidance in Robot Navigation
Suggested Reward Function
Preliminaries
Markov decision process (MDP) and navigation methods
Transformable Gaussian Reward Function (TGRF)
Formula and number of hyperparameters
Transformability
Reward function
Simulation Experiments
Experimental environment
Results
Results in Different Navigation Methods and Environment
...and 3 more sections

Figures (9)

Figure 1: Robot's actions when adequate reward functions are used or not. Red shapes represent penalties, and yellow arrows indicate robot's actions. Penalties are imposed when the robot is in proximity to humans, within their surroundings, or moving in their direction.
Figure 2: Diagram of an MDP with navigation model applied
Figure 3: Normal distribution. The X-axis denotes the X-value, and Y-axis represents the $N(x;\mu,\sigma)$. In (a), $\mu=0,\sigma=1$. In (b), $\mu=0,\sigma=2$.
Figure 4: TGRF. The X-axis denotes the X-position in meters, Y-axis represents the Y-position in meters, and Z-axis indicates TGRF value when $w_{TGRF}=1$,$x_{TGRF}=dist(X,Y)$,$\mu_{TGRF}=0$,$\sigma_{TGRF}=2$.
Figure 5: Transformability of TGRF. The X-axis denotes the X-position in meters, Y-axis represents the Y-position in meters, and Z-axis indicates TGRF value. In (a), $w_{TGRF}=1$,$x_{TGRF}=dist(X,Y)$,$\mu_{TGRF}=0$,$\sigma_{TGRF}=5$. In (b), $w_{TGRF}=1$,$x_{TGRF}=dist(X,Y)$,$\mu_{TGRF}=0$,$\sigma_{TGRF}=5000$.
...and 4 more figures

Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

TL;DR

Abstract

Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)