Massively Scalable Inverse Reinforcement Learning in Google Maps

Matt Barnes; Matthew Abueg; Oliver F. Lange; Matt Deeds; Jason Trader; Denali Molitor; Markus Wulfmeier; Shawn O'Banion

Massively Scalable Inverse Reinforcement Learning in Google Maps

Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O'Banion

TL;DR

Scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms are introduced.

Abstract

Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. In this paper, we introduce scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms. We revisit classic IRL methods in the routing context, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. This insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in route quality at a global scale, and to the best of our knowledge, represents the largest published study of IRL algorithms in a real-world setting to date. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.

Massively Scalable Inverse Reinforcement Learning in Google Maps

TL;DR

Scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms are introduced.

Abstract

Paper Structure (43 sections, 3 theorems, 19 equations, 11 figures, 3 tables, 4 algorithms)

This paper contains 43 sections, 3 theorems, 19 equations, 11 figures, 3 tables, 4 algorithms.

Introduction
Inverse Reinforcement Learning
Goal conditioning
Related Work
Methods
Parallelism strategies
MaxEnt++ initialization
Receding Horizon Inverse Planning (rhip)
Graph compression
Empirical Study
Road graph
Demonstration dataset
Experimental region
Baselines
Reward model descriptions
...and 28 more sections

Key Result

Theorem B.1

$\ell(\theta)<\infty$ iff $A$ has a dominant eigenvalue of 1.

Figures (11)

Figure 1: Google Maps route accuracy improvements in several world regions, when using our inverse reinforcement learning policy. Full results are presented in \ref{['tab:results']} and \ref{['fig:nodes_vs_accuracy']}.
Figure 2: Architecture overview. The final rewards are used to serve online routing requests.
Figure 3: rhip (Receding Horizon Inverse Planning)
Figure 4: Example of the 360M parameter sparse model finding and correcting a data quality error in Nottingham. The preferred route is incorrectly marked as private property due to the presence of a gate (which is never closed), and incorrectly incurs a high cost. The detour route is long and narrow. The sparse model learns to correct the data error with a large positive reward on the gated segment. Additional examples are provided in \ref{['app:experiments']}.
Figure 6: Sparse mixture-of-experts learn preferences specific to their geographic region, as demonstrated by the drop in off-diagonal performance.
...and 6 more figures

Theorems & Definitions (6)

Theorem B.1
proof
Theorem B.2
proof
Theorem B.3
proof

Massively Scalable Inverse Reinforcement Learning in Google Maps

TL;DR

Abstract

Massively Scalable Inverse Reinforcement Learning in Google Maps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (6)