Table of Contents
Fetching ...

Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments

Jingwei Zhang, Jost Tobias Springenberg, Joschka Boedecker, Wolfram Burgard

TL;DR

The paper tackles robot navigation without explicit localization, mapping, or planning by framing navigation as a sequence of related RL tasks. It introduces successor feature reinforcement learning (SF-RL), which decouples reward estimation from environment dynamics via successor features and a neural feature map, enabling fast transfer across tasks with minimal memory. Through extensive simulated and real-world experiments using visual and depth inputs, SF-RL demonstrates rapid adaptation to new mazes while preserving performance on previously learned tasks, outperforming standard baselines in transfer scenarios. The work substantiates the practicality of learning compact, transferable representations for navigation in changing environments and highlights the potential for deployment on resource-limited robotic platforms.

Abstract

In this paper we consider the problem of robot navigation in simple maze-like environments where the robot has to rely on its onboard sensors to perform the navigation task. In particular, we are interested in solutions to this problem that do not require localization, mapping or planning. Additionally, we require that our solution can quickly adapt to new situations (e.g., changing navigation goals and environments). To meet these criteria we frame this problem as a sequence of related reinforcement learning tasks. We propose a successor feature based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances. Our algorithm substantially decreases the required learning time after the first task instance has been solved, which makes it easily adaptable to changing environments. We validate our method in both simulated and real robot experiments with a Robotino and compare it to a set of baseline methods including classical planning-based navigation.

Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments

TL;DR

The paper tackles robot navigation without explicit localization, mapping, or planning by framing navigation as a sequence of related RL tasks. It introduces successor feature reinforcement learning (SF-RL), which decouples reward estimation from environment dynamics via successor features and a neural feature map, enabling fast transfer across tasks with minimal memory. Through extensive simulated and real-world experiments using visual and depth inputs, SF-RL demonstrates rapid adaptation to new mazes while preserving performance on previously learned tasks, outperforming standard baselines in transfer scenarios. The work substantiates the practicality of learning compact, transferable representations for navigation in changing environments and highlights the potential for deployment on resource-limited robotic platforms.

Abstract

In this paper we consider the problem of robot navigation in simple maze-like environments where the robot has to rely on its onboard sensors to perform the navigation task. In particular, we are interested in solutions to this problem that do not require localization, mapping or planning. Additionally, we require that our solution can quickly adapt to new situations (e.g., changing navigation goals and environments). To meet these criteria we frame this problem as a sequence of related reinforcement learning tasks. We propose a successor feature based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances. Our algorithm substantially decreases the required learning time after the first task instance has been solved, which makes it easily adaptable to changing environments. We validate our method in both simulated and real robot experiments with a Robotino and compare it to a set of baseline methods including classical planning-based navigation.

Paper Structure

This paper contains 18 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Exemplary maze-like environment considered in this paper (Map6) and the optimal path from a randomly chosen start position to the goal (orange traffic cone) taken by the Robotino robot (top) together with the sensory input captured by the robot's on-board kinect sensor (bottom).
  • Figure 2: Visualization of the model architecture: $\theta_{\phi}$ parameterizes a convolutional network for extracting features $\phi^k_{\mathbf{s}_t}$ ($k$ is the current target task) from $\mathbf{s}_t$ (contains three convolutional layers, with the first layer consisting of 32 $8\times8$ filters with stride 4, the second of 64 $4\times4$ filters with stride 2 and the 3rd of 64 $3\times3$ filters with stride 1, each followed by a rectifying nonlinearity; the last layer is followed by one fully-connected layer with 512 units); $\theta_d$ reconstructs $\mathbf{s}_t$ back from $\phi^k_{\mathbf{s}_t}$ (contains five de-convolutional layers, with feature sizes {512, 256, 128, 64, 4} and increasing spatial dimensionality in factors of 2); $\omega$ regresses the immediate reward $R^k(\mathbf{s}_t)$ out of the state representation $\phi^k_{\mathbf{s}_t}$; $\theta_{\psi}$ computes the successor features $\psi^{k}(\phi^{k}_{\mathbf{s}_t}, a_n; \theta_{\psi^k})$ for each $a_n \in \mathcal{A}$ (contains two fully-connected layers); $\mathcal{B}^i$ maps the features of the current task $k$ back to those of the old tasks.
  • Figure 3: Exemplary views the agent observes in the simulated environment.
  • Figure 4: Comparison between the true (yellow) and the predicted (blue) poses.
  • Figure 5: Top-down schematic view of the four different maze environments we consider for the simulated experiments.
  • ...and 3 more figures