Table of Contents
Fetching ...

RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios

Chen Zhu-Tian, Daniele Chiappalupi, Tica Lin, Yalong Yang, Johanna Beyer, Hanspeter Pfister

TL;DR

RL-LABEL introduces a deep reinforcement learning approach to AR label placement that accounts for current and predicted future object states and the viewer's viewpoint. The method employs an Encoder-Actor-Critic architecture, with a ray-space viewpoint encoding and self-attention to handle dynamic neighbor configurations, and an action space of accelerations on an $x$-$z$ plane over the object. In simulations based on NBA and STU datasets, RL-LABEL learns long-horizon policies that reduce occlusions, leader-line intersections, and label jitter more effectively than a no-view-management baseline and a force-based method, with user studies indicating improved accuracy and faster task completion in identifying, comparing, and summarizing data. The work demonstrates the viability of RL for dynamic AR visualizations and lays a foundation for future RL-based AR view management and human-in-the-loop enhancements.

Abstract

Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while keeping visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle in dynamic scenarios with constantly moving objects. This is due to their focus on generating layouts optimal for the current moment, neglecting future moments and leading to sub-optimal or unstable layouts over time. In this work, we present RL-LABEL, a deep reinforcement learning-based method for managing the placement of AR labels in scenarios involving moving objects. RL-LABEL considers the current and predicted future states of objects and labels, such as positions and velocities, as well as the user's viewpoint, to make informed decisions about label placement. It balances the trade-offs between immediate and long-term objectives. Our experiments on two real-world datasets show that RL-LABEL effectively learns the decision-making process for long-term optimization, outperforming two baselines (i.e., no view management and a force-based method) by minimizing label occlusions, line intersections, and label movement distance. Additionally, a user study involving 18 participants indicates that RL-LABEL excels over the baselines in aiding users to identify, compare, and summarize data on AR labels within dynamic scenes.

RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios

TL;DR

RL-LABEL introduces a deep reinforcement learning approach to AR label placement that accounts for current and predicted future object states and the viewer's viewpoint. The method employs an Encoder-Actor-Critic architecture, with a ray-space viewpoint encoding and self-attention to handle dynamic neighbor configurations, and an action space of accelerations on an - plane over the object. In simulations based on NBA and STU datasets, RL-LABEL learns long-horizon policies that reduce occlusions, leader-line intersections, and label jitter more effectively than a no-view-management baseline and a force-based method, with user studies indicating improved accuracy and faster task completion in identifying, comparing, and summarizing data. The work demonstrates the viability of RL for dynamic AR visualizations and lays a foundation for future RL-based AR view management and human-in-the-loop enhancements.

Abstract

Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while keeping visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle in dynamic scenarios with constantly moving objects. This is due to their focus on generating layouts optimal for the current moment, neglecting future moments and leading to sub-optimal or unstable layouts over time. In this work, we present RL-LABEL, a deep reinforcement learning-based method for managing the placement of AR labels in scenarios involving moving objects. RL-LABEL considers the current and predicted future states of objects and labels, such as positions and velocities, as well as the user's viewpoint, to make informed decisions about label placement. It balances the trade-offs between immediate and long-term objectives. Our experiments on two real-world datasets show that RL-LABEL effectively learns the decision-making process for long-term optimization, outperforming two baselines (i.e., no view management and a force-based method) by minimizing label occlusions, line intersections, and label movement distance. Additionally, a user study involving 18 participants indicates that RL-LABEL excels over the baselines in aiding users to identify, compare, and summarize data on AR labels within dynamic scenes.
Paper Structure (22 sections, 7 equations, 13 figures, 3 tables)

This paper contains 22 sections, 7 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Left: 2D labels are placed and managed in the image space. Right: 3D labels are placed and managed in the world space, and subsequently projected onto the image plane.
  • Figure 2: We consider scenarios where a stationary viewer observes multiple moving objects, each with a corresponding label.
  • Figure 3: Our RL-based method consists of three components: an Encoder that encodes the current state of the environment, an Actor that generates actions for placing the labels, and a Critic that evaluates the generated actions and provides feedback to improve the actor based on the rewards obtained from the environment.
  • Figure 4: For each label or object, the Encoder considers its state (e.g., position) relative to the camera and neighboring objects. It then employs a neural network to embed the label or object's state into a high-dimensional vector.
  • Figure 5: An Actor network generates actions based on the state embedding to place the label. The label's movement is constrained to a two-dimensional x-z square plane on top of its target object, and the available actions consist of x- and z- accelerations for the label.
  • ...and 8 more figures