Table of Contents
Fetching ...

The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning

Jorge de Heuvel, Daniel Marta, Simon Holk, Iolanda Leite, Maren Bennewitz

TL;DR

This work investigates how interface modality (VR versus 2D views) influences human preference elicitation and the learning of human-aware navigation policies in preference-based reinforcement learning. It introduces a public dataset of 2,325 navigation preference queries collected across VR and 2D interfaces using an EnQuery ensemble of $N_E = 4$ TD3 policies, and trains modality-specific reward models with $r = \lambda r_hat + (1-\lambda) r_core$ where $\lambda = 0.2$, comparing three policies: $\pi_{VR}$, $\pi_{2D-TD}$, and $\pi_{2D-FPV}$. The study finds that VR improves immersion and ease of preference expression, but preferences diverge across modalities, yielding distinct policy outcomes and about $70\%$ modality agreement with notable inter-participant variability. The results underscore the need to account for interface effects in PbRL and provide a public dataset to support future research, with VR-based policies offering the strongest overall trade-off between efficiency and safety in human-aware navigation.

Abstract

Aligning robot navigation with human preferences is essential for ensuring comfortable, and predictable robot movement in shared spaces. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset is available to support future research.

The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning

TL;DR

This work investigates how interface modality (VR versus 2D views) influences human preference elicitation and the learning of human-aware navigation policies in preference-based reinforcement learning. It introduces a public dataset of 2,325 navigation preference queries collected across VR and 2D interfaces using an EnQuery ensemble of TD3 policies, and trains modality-specific reward models with where , comparing three policies: , , and . The study finds that VR improves immersion and ease of preference expression, but preferences diverge across modalities, yielding distinct policy outcomes and about modality agreement with notable inter-participant variability. The results underscore the need to account for interface effects in PbRL and provide a public dataset to support future research, with VR-based policies offering the strongest overall trade-off between efficiency and safety in human-aware navigation.

Abstract

Aligning robot navigation with human preferences is essential for ensuring comfortable, and predictable robot movement in shared spaces. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset is available to support future research.

Paper Structure

This paper contains 23 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Left: Our study collects preferences via Virtual Reality (VR) and 2D interfaces (top-down & first-person views), enabling a systematic comparison of interface modalities for collecting human preferences on robot navigation. Top-right: The preference dataset consists of 2,325 navigation queries from 31 participants. Bottom-right: Using Reinforcement Learning from Human Feedback (RLHF) with our dataset, we refine a standard navigation policy (blue) into preference-aligned policies (orange).
  • Figure 2: Survey results (S2) comparing user experiences across three interface conditions: virtual reality (VR), 2D top-down (2D-TD), and 2D first-person view (2D-FPV). Participants rated their experience across multiple aspects after each block. Ratings were provided on a Likert scale (1-7), bars indicate score means, standard errors are indicated. Asterisks denote significance levels (* $p < .05$, ** $p < .01$, *** $p < .001$).
  • Figure 3: User rankings (S3) of three modalities, namely Virtual Reality (VR), 2D Top-Down (2D-TD), and 2D First-Person View (2D-FPV), based on perceived usefulness, ease of use, and intention to use. Each bar represents the percentage of participants who assigned first, second, and third ranks to each modality. VR is predominantly ranked highest in usefulness and intention to use.
  • Figure 4: Change in the preferred trajectory with a modality shift in cases of preference disagreement between interfaces for a given participant. Metrics are $z$-standardized for all queried trajectories per participant. Bars show mean change, and error bars indicate the standard error of the participant means averaged over their disagreements. Participants preferred shorter and more straightforward driving trajectories in 2D interfaces compared to VR, with the robot occasionally traversing closer to the human.
  • Figure 5: Navigation behavior comparison between aligned policies $\pi_\text{VR}$, $\pi_\text{2D-TD}$, and $\pi_\text{2D-FPV}$ and a non-aligned baseline counterpart $\pi_\text{BL}$ in four navigation scenarios. The aligned policies exhibit smoother and more obstacle-aware trajectories than the non-aligned policy $\pi_\text{BL}$, with $\pi_{\text{VR}}$ and $\pi_{\text{2D-TD}}$ demonstrating the best balance between efficiency and safety.