Table of Contents
Fetching ...

SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning

Zhipeng Li, Christoph Gebhardt, Yves Inglin, Nicolas Steck, Paul Streli, Christian Holz

TL;DR

SituationAdapt is introduced, a system that adjusts Mixed Reality UIs to real-world surroundings by considering environmental and social cues in shared settings by using a Vision-and-Language Model to assess the placement of interactive UI elements.

Abstract

Mixed Reality is increasingly used in mobile settings beyond controlled home and office spaces. This mobility introduces the need for user interface layouts that adapt to varying contexts. However, existing adaptive systems are designed only for static environments. In this paper, we introduce SituationAdapt, a system that adjusts Mixed Reality UIs to real-world surroundings by considering environmental and social cues in shared settings. Our system consists of perception, reasoning, and optimization modules for UI adaptation. Our perception module identifies objects and individuals around the user, while our reasoning module leverages a Vision-and-Language Model to assess the placement of interactive UI elements. This ensures that adapted layouts do not obstruct relevant environmental cues or interfere with social norms. Our optimization module then generates Mixed Reality interfaces that account for these considerations as well as temporal constraints. For evaluation, we first validate our reasoning module's capability of assessing UI contexts in comparison to human expert users. In an online user study, we then establish SituationAdapt's capability of producing context-aware layouts for Mixed Reality, where it outperformed previous adaptive layout methods. We conclude with a series of applications and scenarios to demonstrate SituationAdapt's versatility.

SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM Reasoning

TL;DR

SituationAdapt is introduced, a system that adjusts Mixed Reality UIs to real-world surroundings by considering environmental and social cues in shared settings by using a Vision-and-Language Model to assess the placement of interactive UI elements.

Abstract

Mixed Reality is increasingly used in mobile settings beyond controlled home and office spaces. This mobility introduces the need for user interface layouts that adapt to varying contexts. However, existing adaptive systems are designed only for static environments. In this paper, we introduce SituationAdapt, a system that adjusts Mixed Reality UIs to real-world surroundings by considering environmental and social cues in shared settings. Our system consists of perception, reasoning, and optimization modules for UI adaptation. Our perception module identifies objects and individuals around the user, while our reasoning module leverages a Vision-and-Language Model to assess the placement of interactive UI elements. This ensures that adapted layouts do not obstruct relevant environmental cues or interfere with social norms. Our optimization module then generates Mixed Reality interfaces that account for these considerations as well as temporal constraints. For evaluation, we first validate our reasoning module's capability of assessing UI contexts in comparison to human expert users. In an online user study, we then establish SituationAdapt's capability of producing context-aware layouts for Mixed Reality, where it outperformed previous adaptive layout methods. We conclude with a series of applications and scenarios to demonstrate SituationAdapt's versatility.
Paper Structure (35 sections, 4 equations, 8 figures)

This paper contains 35 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: Schematic overview of SituationAdapt's system. Our perception module recognizes 2D areas of interest in the environment and computes 3D bounding boxes of the respective objects. Our reasoning module takes the areas as input and leverages a VLM to rate their overlay- and interaction suitability. Unity then assigns these ratings to the respective 3D bounding boxes and our optimization module adapts MR UIs accordingly.
  • Figure 2: Our implementation of the perception module. Based on color- and depth frames of an RGBD camera, a 3D mapping stage reconstructs the camera position and the surroundings of the user as point cloud. An object detection node computes semantically annotated 2D bounding boxes. The last stage segments 3D bounding boxes based on the 2D ares, the point cloud and the camera position.
  • Figure 3: Our survey covered these and other scenarios. Participants rated the overlay and interaction suitability for each area.
  • Figure 4: Boxplots of the overlay (a--c) and interaction (d--f) suitability ratings of participants (ptps) and VLM (vlms) for the subway scenario (\ref{['fig:scenario_examples']}, middle). For both questions, it can be seen that the standard deviation of ptp responses is consistently larger than that of vlm responses. The boxplots further show that medians of both conditions frequently overlap (b, c, e, f). For areas where they do not, ptps often exhibit a high standard deviation in their ratings (a).
  • Figure 5: Our study setup replicated a university seminar room, where the participant was sitting in the last row and another attendee was seated in the row ahead.
  • ...and 3 more figures