Learning Contextually-Adaptive Rewards via Calibrated Features
Alexandra Forsey-Smerek, Julie Shah, Andreea Bobu
TL;DR
This work tackles context-dependent reward learning by explicitly modeling how context modulates the saliency of reward features rather than the underlying preferences. It introduces calibrated features—context-conditioned feature mappings—and learns them through contextual feature queries grounded in Bradley–Terry likelihood, separating context effects from context-invariant rewards. Empirical results in simulation show that calibrating features yields higher reward accuracy with an order of magnitude fewer preference queries and better low-data performance, while a human user study confirms feasibility and personalization of contextual preferences. The approach supports modular, reusable representations that can be composed to form context-adaptive rewards, with potential for improved interpretability and practical deployment in personalizable robotic systems.
Abstract
A key challenge in reward learning from human input is that desired agent behavior often changes based on context. For example, a robot must adapt to avoid a stove once it becomes hot. We observe that while high-level preferences (e.g., prioritizing safety over efficiency) often remain constant, context alters the $\textit{saliency}$--or importance--of reward features. For instance, stove heat changes the relevance of the robot's proximity, not the underlying preference for safety. Moreover, these contextual effects recur across tasks, motivating the need for transferable representations to encode them. Existing multi-task and meta-learning methods simultaneously learn representations and task preferences, at best $\textit{implicitly}$ capturing contextual effects and requiring substantial data to separate them from task-specific preferences. Instead, we propose $\textit{explicitly}$ modeling and learning context-dependent feature saliency separately from context-invariant preferences. We introduce $\textit{calibrated features}$--modular representations that capture contextual effects on feature saliency--and present specialized paired comparison queries that isolate saliency from preference for efficient learning. Simulated experiments show our method improves sample efficiency, requiring 10x fewer preference queries than baselines to achieve equivalent reward accuracy, with up to 15% better performance in low-data regimes (5-10 queries). An in-person user study (N=12) demonstrates that participants can effectively teach their personal contextual preferences with our method, enabling adaptable and personalized reward learning.
