Aligning Robot Navigation Behaviors with Human Intentions and Preferences

Haresh Karnan

Aligning Robot Navigation Behaviors with Human Intentions and Preferences

Haresh Karnan

TL;DR

The contributions in this dissertation take significant steps toward addressing the value alignment problem in autonomous navigation, enabling mobile robots to navigate autonomously with objectives that align with human intentions and preferences.

Abstract

Recent advances in the field of machine learning have led to new ways for mobile robots to acquire advanced navigational capabilities. However, these learning-based methods raise the possibility that learned navigation behaviors may not align with the intentions and preferences of people, a problem known as value misalignment. To mitigate this risk, this dissertation aims to answer the question: "How can we use machine learning methods to align the navigational behaviors of autonomous mobile robots with human intentions and preferences?" First, this dissertation addresses this question by introducing a new approach to learning navigation behaviors by imitating human-provided demonstrations of the intended navigation task. This contribution allows mobile robots to acquire autonomous visual navigation capabilities through imitation, using a novel objective function that encourages the agent to align with the human's navigation objectives and penalizes misalignment. Second, this dissertation introduces two algorithms to enhance terrain-aware off-road navigation for mobile robots by learning visual terrain awareness in a self-supervised manner. This contribution enables mobile robots to respect a human operator's preferences for navigating different terrains in urban outdoor environments, while extrapolating these preferences to visually novel terrains by leveraging multi-modal representations. Finally, in the context of robot navigation in human-occupied environments, this dissertation introduces a dataset and an algorithm for robot navigation in a socially compliant manner in both indoor and outdoor environments. In summary, the contributions in this dissertation take significant steps toward addressing the value alignment problem in autonomous navigation, enabling mobile robots to navigate autonomously with objectives that align with human intentions and preferences.

Aligning Robot Navigation Behaviors with Human Intentions and Preferences

TL;DR

Abstract

Paper Structure (99 sections, 9 equations, 46 figures, 7 tables)

This paper contains 99 sections, 9 equations, 46 figures, 7 tables.

Introduction
Contributions
Reading Guide to the Thesis
Background
Machine Learning for Robot Navigation
Machine Learning for Off-Road Navigation
Machine Learning for Social Navigation
Value Alignment Verification
Visual Imitation Learning for Robot Navigation
Introduction
Background and Related Work
Machine Learning for Autonomous Navigation
Imitation from Observation
Feature Detection and Matching
SLAM-based Approaches for Navigation
...and 84 more sections

Figures (46)

Figure 1: Overview of the representation learning and reinforcement learning training approaches in voila. To improve sample efficiency in the RL training step, we first learn a latent representation of the visual observations using an raerae. We then frame-stack the latent representations $z_{t-2}, z_{t-1}, z_t$ of three consecutive observations as state inputs to the policy network $\pi_\theta$, which is then trained using sachaarnoja2018softsac, with the visual encoder network's weights frozen.
Figure 2: Aerial image of the AirSim simulation environment. Green lines show the tracks used to train the agent and red lines show the tracks unseen by the agent.
Figure 3: Imitation performance of policies learned using voila and gaifo in AirSim. The y-axis shows the Hausdorff distance between the expert and imitator's trajectories, averaged across five trials (lower distance indicates behavior more similar to the expert). A Hausdorff distance greater than 10.0 (marked by the red line) indicates a failure in imitating the demonstration. We see that with viewpoint mismatch, the gaifo agent is unable to imitate the expert successfully on all tracks, whereas voila is unaffected by viewpoint mismatch and results in policies that induce behavior closer to that of the demonstrator. Tracks 1 and 2 were used for training, and other tracks were unseen by the agent while learning.
Figure 4: Policy rollout trajectories of the voila agent (green) successfully imitating a demonstration behavior (black) of patrolling a rectangular hallway clockwise. The demonstration consists of a video gathered by a human walking while using a handheld camera that is considerably higher than the robot's camera (introducing significant viewpoint mismatch). We see that the voila agent is able to successfully imitate the expert demonstration even in the presence of this egocentric viewpoint mismatch.
Figure 5: The voila agent (green), trained in the unperturbed training environment (left), deployed here in the perturbed environment (right). We see that the learned policy is robust to the visual differences between the training and deployment environment, examples of which are provided as image pairs.
...and 41 more figures

Aligning Robot Navigation Behaviors with Human Intentions and Preferences

TL;DR

Abstract

Aligning Robot Navigation Behaviors with Human Intentions and Preferences

Authors

TL;DR

Abstract

Table of Contents

Figures (46)