Table of Contents
Fetching ...

What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation

Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig

TL;DR

This paper tackles how reward function design governs RL-driven biomechanical user simulations in HCI. It systematically varies three reward components—task completion, target proximity, and effort—within a choice-reaction task using the UitB framework and a five-DoF musculoskeletal model. The authors formalize the composite reward as $r_t = w_{bonus} f_{bonus}(\cdot) - w_{distance} f_{distance}(\cdot) - w_{effort} f_{effort}(\cdot)$ and test multiple distance (absolute, squared, exponential) and effort (EJK, DC, CTC, JAC) formulations. Key findings show that a completion bonus combined with proximity rewards is essential for task success, effort terms are optional if proximity is well designed, and the work provides guidelines to make biomechanical RL simulations more practical for HCI design and evaluation.

Abstract

Biomechanical models allow for diverse simulations of user movements in interaction. Their performance depends critically on the careful design of reward functions, yet the interplay between reward components and emergent behaviours remains poorly understood. We investigate what makes a model "breathe" by systematically analysing the impact of rewarding effort minimisation, task completion, and target proximity on movement trajectories. Using a choice reaction task as a test-bed, we find that a combination of completion bonus and proximity incentives is essential for task success. Effort terms are optional, but can help avoid irregularities if scaled appropriately. Our work offers practical insights for HCI designers to create realistic simulations without needing deep reinforcement learning expertise, advancing the use of simulations as a powerful tool for interaction design and evaluation in HCI.

What Makes a Model Breathe? Understanding Reinforcement Learning Reward Function Design in Biomechanical User Simulation

TL;DR

This paper tackles how reward function design governs RL-driven biomechanical user simulations in HCI. It systematically varies three reward components—task completion, target proximity, and effort—within a choice-reaction task using the UitB framework and a five-DoF musculoskeletal model. The authors formalize the composite reward as and test multiple distance (absolute, squared, exponential) and effort (EJK, DC, CTC, JAC) formulations. Key findings show that a completion bonus combined with proximity rewards is essential for task success, effort terms are optional if proximity is well designed, and the work provides guidelines to make biomechanical RL simulations more practical for HCI design and evaluation.

Abstract

Biomechanical models allow for diverse simulations of user movements in interaction. Their performance depends critically on the careful design of reward functions, yet the interplay between reward components and emergent behaviours remains poorly understood. We investigate what makes a model "breathe" by systematically analysing the impact of rewarding effort minimisation, task completion, and target proximity on movement trajectories. Using a choice reaction task as a test-bed, we find that a combination of completion bonus and proximity incentives is essential for task success. Effort terms are optional, but can help avoid irregularities if scaled appropriately. Our work offers practical insights for HCI designers to create realistic simulations without needing deep reinforcement learning expertise, advancing the use of simulations as a powerful tool for interaction design and evaluation in HCI.

Paper Structure

This paper contains 12 sections, 8 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Comparison of movement patterns: only including the completion bonus leads to arbitrary movements (left); a combination of distance and effort rewards may incentivize hitting the button from the side or stopping immediately below the target (middle); a combination of completion bonus and distance rewards leads to reasonable arm movements and successful button clicks (right).
  • Figure 2: Comparison of movement patterns of different effort models, from left to right: CTC model with no movement, JAC model with extended arm, DC model with bent arm, and EJK model remaining on the lower buttons.
  • Figure 3: Success rates (top) and average task completion times (bottom) of models trained with different reward functions of type \ref{['eq:reward-fct-composite']}. Full parameter details are given in Table \ref{['tab:parameters_figure']} in the appendix. Orange circles correspond to reward functions without distance rewards and with different bonus values (1, 8, and 50, all leading to the same success rate for a given effort model). The bottom figure shows the average task completion times of all models with completion bonus and a success rate of at least 50%. If a model does not manage to press a button within the time limit, the maximum time of four seconds is taken.