What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation
Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig
TL;DR
This paper tackles how reward function design governs RL-driven biomechanical user simulations in HCI. It systematically varies three reward components—task completion, target proximity, and effort—within a choice-reaction task using the UitB framework and a five-DoF musculoskeletal model. The authors formalize the composite reward as $r_t = w_{bonus} f_{bonus}(\cdot) - w_{distance} f_{distance}(\cdot) - w_{effort} f_{effort}(\cdot)$ and test multiple distance (absolute, squared, exponential) and effort (EJK, DC, CTC, JAC) formulations. Key findings show that a completion bonus combined with proximity rewards is essential for task success, effort terms are optional if proximity is well designed, and the work provides guidelines to make biomechanical RL simulations more practical for HCI design and evaluation.
Abstract
Biomechanical models allow for diverse simulations of user movements in interaction. Their performance depends critically on the careful design of reward functions, yet the interplay between reward components and emergent behaviours remains poorly understood. We investigate what makes a model "breathe" by systematically analysing the impact of rewarding effort minimisation, task completion, and target proximity on movement trajectories. Using a choice reaction task as a test-bed, we find that a combination of completion bonus and proximity incentives is essential for task success. Effort terms are optional, but can help avoid irregularities if scaled appropriately. Our work offers practical insights for HCI designers to create realistic simulations without needing deep reinforcement learning expertise, advancing the use of simulations as a powerful tool for interaction design and evaluation in HCI.
