RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments

Diego Martinez-Baselga; Luis Riazuelo; Luis Montano

RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments

Diego Martinez-Baselga, Luis Riazuelo, Luis Montano

TL;DR

RUMOR addresses autonomous navigation in highly dynamic environments by integrating a model-based environmental abstraction, Dynamic Object Velocity Space ($DOVS$), with a deep reinforcement learning controller (Soft Actor-Critic) that operates over a kinodynamics-aware continuous action space. This fusion enables the robot to interpret complex dynamic scenes through a robocentric velocity representation while ensuring produced commands respect differential-drive kinematics. Key contributions include the $DOVS$ formulation (combining Dynamic Object Velocities and Free Velocities over a horizon $T_h$), a two-stream encoder with an LSTM for robust occlusion handling, and a training setup that leverages realistic simulation to reduce sim-to-real gaps; real-world tests with pedestrians demonstrate transferability. The results show NR-RUMOR achieving higher success rates and competitive or superior navigation times compared to a broad set of baselines, highlighting the practical impact for dense, dynamic, and unseen environments. Overall, the work advances dynamic-enabled, kinodynamics-aware planning by embedding robust environmental abstractions into a DRL framework, enabling smoother, safer real-world navigation for differential-drive robots.

Abstract

Autonomous navigation in dynamic environments is a complex but essential task for autonomous robots, with recent deep reinforcement learning approaches showing promising results. However, the complexity of the real world makes it infeasible to train agents in every possible scenario configuration. Moreover, existing methods typically overlook factors such as robot kinodynamic constraints, or assume perfect knowledge of the environment. In this work, we present RUMOR, a novel planner for differential-drive robots that uses deep reinforcement learning to navigate in highly dynamic environments. Unlike other end-to-end DRL planners, it uses a descriptive robocentric velocity space model to extract the dynamic environment information, enhancing training effectiveness and scenario interpretation. Additionally, we propose an action space that inherently considers robot kinodynamics and train it in a simulator that reproduces the real world problematic aspects, reducing the gap between the reality and simulation. We extensively compare RUMOR with other state-of-the-art approaches, demonstrating a better performance, and provide a detailed analysis of the results. Finally, we validate RUMOR's performance in real-world settings by deploying it on a ground robot. Our experiments, conducted in crowded scenarios and unseen environments, confirm the algorithm's robustness and transferability.

RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments

TL;DR

RUMOR addresses autonomous navigation in highly dynamic environments by integrating a model-based environmental abstraction, Dynamic Object Velocity Space (

), with a deep reinforcement learning controller (Soft Actor-Critic) that operates over a kinodynamics-aware continuous action space. This fusion enables the robot to interpret complex dynamic scenes through a robocentric velocity representation while ensuring produced commands respect differential-drive kinematics. Key contributions include the

formulation (combining Dynamic Object Velocities and Free Velocities over a horizon

), a two-stream encoder with an LSTM for robust occlusion handling, and a training setup that leverages realistic simulation to reduce sim-to-real gaps; real-world tests with pedestrians demonstrate transferability. The results show NR-RUMOR achieving higher success rates and competitive or superior navigation times compared to a broad set of baselines, highlighting the practical impact for dense, dynamic, and unseen environments. Overall, the work advances dynamic-enabled, kinodynamics-aware planning by embedding robust environmental abstractions into a DRL framework, enabling smoother, safer real-world navigation for differential-drive robots.

Abstract

Paper Structure (18 sections, 24 equations, 10 figures, 1 table)

This paper contains 18 sections, 24 equations, 10 figures, 1 table.

Introduction
Related work
Motion planning in dynamic environments
Deep reinforcement learning planners
Problem formulation
Methodology
Dynamic Object Velocity Space (DOVS)
Reinforcement learning setup
State space
Action space
Observation space, observation model and transition model
Reward function
Network
Experiments
Experimental setup
...and 3 more sections

Figures (10)

Figure 1: Pipeline of the approach presented. It takes the information sensed from the environment to construct a model of the dynamism of the scenario. Then DRL is used to compute differential-drive velocity commands.
Figure 2: Graphical representation of the DOVS model and the differential-drive robot restrictions. Constraints of Equation \ref{['eq:dymanic-const']} are represented with two black lines and restrict maximum linear velocities regarding the angular velocity. Equation \ref{['eq:acc-const']} are plotted as a green rhombus around $\boldsymbol{u}_t$, representing acceleration limits regarding a differential-drive robot. In this example, the robot velocity limits are $v_{max}=0.7$ m/s, $\omega_{max}=\pi$ rad/s and $a_{max}=0.3$ m/s². The dark (DOV) and white (FV) areas include unsafe and safe velocities derived using VO for a time horizon.
Figure 3: A robocentric view of $\mathcal{W}$ (a) and a graphical representation of the DOVS (b) of a scenario where a robot faces the obstacle $i$ that follows a linear trajectory. In (a), the robot center is represented in red, the trajectories $\tau_j\in\mathcal{T}$ sampled in blue lines, the obstacle augmented radius with a blue circle and the collision band $\mathcal{B}_i$ with green lines. The intersection points between $\tau_j$ and $\mathcal{B}_i$ are $P_{1,i,j}$ (right) and $P_{2,i,j}$ (left). In (b), the maximum velocities to pass after the obstacle are represented with black dots ($\boldsymbol{u}_{2,i,j}$) and the minimum velocities to pass before it with purple dots ($\boldsymbol{u}_{1,i,j}$). The DOV is represented in gray and the FV in white.
Figure 4: Graphical representation of the notation used in the action space configuration, attaining for restrictions previously represented in Figure \ref{['fig:DOVS-rhombus']}. $\mathbf{u}_t$, $\mathbf{u}_t^{up}$, $\mathbf{u}_t^{down}$, $\mathbf{u}_t^{left}$ and $\mathbf{u}_t^{right}$ are represented with gray points, $l_1$ and $l_2$ with black lines, $l_1'$ and $l_2'$ with dotted lines, $\mathbf{b}_{1,t}$ and $\mathbf{b}_{2,t}$ with blue arrows, and $Q_1$ and $Q_2$ with red points.
Figure 5: Structure of the encoding network. The DOVS model and the robot state are processed in two different streams, joint later by a memory layer that accounts for previous observations.
...and 5 more figures

Theorems & Definitions (8)

Definition 1
Definition 2
Definition 3
Definition 4
Remark 1
Remark 2
Remark 3
Remark 4

RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments

TL;DR

Abstract

RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (8)