Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning
Yuebing Liang, Shenhao Wang, Jiangbo Yu, Zhan Zhao, Jinhua Zhao, Sandy Pentland
TL;DR
This work presents an interpretable deep inverse reinforcement learning framework for analyzing sequential activity-travel decisions. It models daily decision sequences as a Markov Decision Process and uses Adversarial IRL to jointly learn a policy and a reward function, with post-hoc interpretability achieved through a surrogate MNL model for policy and reward-sequence analysis for preferences. Key contributions include: (i) an interpretable policy extraction via knowledge distillation, (ii) reward-based clustering to identify distinct decision-maker types, and (iii) long-term return analysis linking daily activity patterns to utility. The approach is validated on Singapore travel survey data, revealing socio-demographic differences in decision patterns and providing actionable insights for transportation planning and policy design beyond mere prediction accuracy.
Abstract
Travel demand modeling has shifted from aggregated trip-based models to behavior-oriented activity-based models because daily trips are essentially driven by human activities. To analyze the sequential activity-travel decisions, deep inverse reinforcement learning (DIRL) has proven effective in learning the decision mechanisms by approximating a reward function to represent preferences and a policy function to replicate observed behavior using deep neural networks (DNNs). However, most existing research has focused on using DIRL to enhance only prediction accuracy, with limited exploration into interpreting the underlying decision mechanisms guiding sequential decision-making. To address this gap, we introduce an interpretable DIRL framework for analyzing activity-travel decision processes, bridging the gap between data-driven machine learning and theory-driven behavioral models. Our proposed framework adapts an adversarial IRL approach to infer the reward and policy functions of activity-travel behavior. The policy function is interpreted through a surrogate interpretable model based on choice probabilities from the policy function, while the reward function is interpreted by deriving both short-term rewards and long-term returns for various activity-travel patterns. Our analysis of real-world travel survey data reveals promising results in two key areas: (i) behavioral pattern insights from the policy function, highlighting critical factors in decision-making and variations among socio-demographic groups, and (ii) behavioral preference insights from the reward function, indicating the utility individuals gain from specific activity sequences.
