Boltzmann State-Dependent Rationality
Osher Lerner
TL;DR
This work addresses the gap in modeling human suboptimality for human-robot collaboration by replacing a single suboptimality parameter $β$ in Boltzmann Rationality with a state-dependent function $β(s)$, enabling a more expressive yet tractable representation of human behavior. It develops forward and inverse models under this state-aware framework, culminating in a maximum-a-posteriori objective for jointly inferring reward weights and state-dependent suboptimality across multiple agents. The authors outline experimental designs in GridWorld and Overcooked with Mechanical Turk data, demonstrating parameter recovery and the ability to generalize learned rewards and suboptimality to goal inference tasks. The proposed approach has practical implications for risk-aware planning, adaptive assistance, and improved human-centered collaboration in complex tasks, where conditional uncertainty and context-sensitive proficiency are essential.
Abstract
This paper expands on existing learned models of human behavior via a measured step in structured irrationality. Specifically, by replacing the suboptimality constant $β$ in a Boltzmann rationality model with a function over states $β(s)$, we gain natural expressivity in a computationally tractable manner. This paper discusses relevant mathematical theory, sets up several experimental designs, presents limited preliminary results, and proposes future investigations.
