Distributional Inverse Reinforcement Learning
Feiyang Wu, Ye Zhao, Anqi Wu
TL;DR
DistIRL addresses the limitation of point-valued rewards in offline IRL by learning both the reward distribution $q_\rho(r|s,a)$ and the full return distribution $Z^\pi$, guided by first-order stochastic dominance (FSD) and distortion risk measures (DRMs). The framework combines energy-based Bayesian reward learning with distributional RL techniques to produce distribution-aware policies without environment interaction. Empirical results across gridworld, neuroscience data, and MuJoCo demonstrate accurate recovery of reward shapes and state-of-the-art imitation performance under risk-sensitive objectives. This yields robust, risk-aware imitation capabilities suitable for behavior analysis and neuroscience applications, with broad applicability to offline scenarios where rewards are stochastic.
Abstract
We propose a distributional framework for offline Inverse Reinforcement Learning (IRL) that jointly models uncertainty over reward functions and full distributions of returns. Unlike conventional IRL approaches that recover a deterministic reward estimate or match only expected returns, our method captures richer structure in expert behavior, particularly in learning the reward distribution, by minimizing first-order stochastic dominance (FSD) violations and thus integrating distortion risk measures (DRMs) into policy learning, enabling the recovery of both reward distributions and distribution-aware policies. This formulation is well-suited for behavior analysis and risk-aware imitation learning. Empirical results on synthetic benchmarks, real-world neurobehavioral data, and MuJoCo control tasks demonstrate that our method recovers expressive reward representations and achieves state-of-the-art imitation performance.
