Table of Contents
Fetching ...

Real-and-Present: Investigating the Use of Life-Size 2D Video Avatars in HMD-Based AR Teleconferencing

Xuanyu Wang, Weizhan Zhang, Christian Sandor, Hongbo Fu

TL;DR

This work tackles AR HMD teleconferencing by proposing life-size $2D$ video avatars to balance fidelity and co-presence under limited $FoV$. It combines a pilot placement study with a proof-of-concept prototype and a VR-expanded FoV study to derive quantitative placement models for current and future HMDs. Key findings show that video avatars can outperform direct AR video transplant and avatar-only approaches by achieving high co-presence while maintaining fidelity, with $FoV$ exerting a measurable influence on horizontal spacing and providing regression-based guidance for avatar placement. The results offer practical, scalable guidance for deploying AR telepresence on existing hardware and inform design for next-generation, wider-$FoV$ headsets.

Abstract

Augmented Reality (AR) teleconferencing allows separately located users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday usage. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further influence users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conduct a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with the prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.

Real-and-Present: Investigating the Use of Life-Size 2D Video Avatars in HMD-Based AR Teleconferencing

TL;DR

This work tackles AR HMD teleconferencing by proposing life-size video avatars to balance fidelity and co-presence under limited . It combines a pilot placement study with a proof-of-concept prototype and a VR-expanded FoV study to derive quantitative placement models for current and future HMDs. Key findings show that video avatars can outperform direct AR video transplant and avatar-only approaches by achieving high co-presence while maintaining fidelity, with exerting a measurable influence on horizontal spacing and providing regression-based guidance for avatar placement. The results offer practical, scalable guidance for deploying AR telepresence on existing hardware and inform design for next-generation, wider- headsets.

Abstract

Augmented Reality (AR) teleconferencing allows separately located users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday usage. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further influence users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conduct a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with the prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.
Paper Structure (28 sections, 1 equation, 10 figures, 5 tables)

This paper contains 28 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: A local user (not shown in this figure) is having a small-group AR teleconference with four remote users through their video avatars. The left scene is seen from the local user's perspective, with the remote users' video avatars (denoted as "RU 1" to "RU 4") overlaid in the local physical space. It is a screenshot from our VR-simulated AR environment (in the FoV = 110$^\circ$ condition) with a post-added HoloLens on each RU's head to demonstrate that every user has the conversation through an AR HMD symmetrically in real-use scenarios. The right image illustrates the layout of the conversation group from the top-down view. The local user adjusts the RUs' placement on the circle using the Radian and Radius parameters.
  • Figure 2: An illustration of the (a) 30-, (b) 40-, and (c) 50-FoV conditions in the pilot study. Here in the images, boundaries of the FoV (i.e., rectangles in white) are approximately annotated and are not visible to the participant. The virtual content outside of the FoV is clipped by the transparent occluders, whose exact positions for each FoV condition are calculated and defined in the controller script. Each image shows the view after users adjust the video avatars’ placement under the corresponding FoV condition. The change in the user's perceived optimal $Radius$ leads to different sizes of video avatars in the images.
  • Figure 3: The statistical results for the data of the pilot study. (a) shows the distribution of all the placement data in all four scenarios (shown in different colors and styles). The solid point in dark blue at (0, 0) denotes the local user. (b) shows the comparison of the Radian data in 2-RU, 3-RU, and 4-RU scenarios since the participant stands directly face-to-face with the remote peer with a radian = $0$ in the 1-RU scenario. (c) shows the comparison of the Radius data in all four scenarios but with outliers hidden since they contribute little to the overall trend and some of them are too far away, as seen in (a).
  • Figure 4: An illustration of the (a) Avatar, (b) Video grid, and (c) Video avatar conditions in the evaluation using our prototype system.
  • Figure 5: The statistical results for the data of the user study in the Evaluation.
  • ...and 5 more figures