The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz
TL;DR
This work investigates how interface modality (VR versus 2D views) influences human preference elicitation and the learning of human-aware navigation policies in preference-based reinforcement learning. It introduces a public dataset of 2,325 navigation preference queries collected across VR and 2D interfaces using an EnQuery ensemble of $N_E = 4$ TD3 policies, and trains modality-specific reward models with $r = \lambda r_hat + (1-\lambda) r_core$ where $\lambda = 0.2$, comparing three policies: $\pi_{VR}$, $\pi_{2D-TD}$, and $\pi_{2D-FPV}$. The study finds that VR improves immersion and ease of preference expression, but preferences diverge across modalities, yielding distinct policy outcomes and about $70\%$ modality agreement with notable inter-participant variability. The results underscore the need to account for interface effects in PbRL and provide a public dataset to support future research, with VR-based policies offering the strongest overall trade-off between efficiency and safety in human-aware navigation.
Abstract
Aligning robot navigation with human preferences is essential for ensuring comfortable, and predictable robot movement in shared spaces. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset is available to support future research.
