Table of Contents
Fetching ...

Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication

Erzhen Hu, Mingyi Li, Jungtaek Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, Ruofei Du

TL;DR

This work proposes Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions and enables users to interact with remote objects or discuss concepts in a collaborative manner.

Abstract

During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.

Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication

TL;DR

This work proposes Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions and enables users to interact with remote objects or discuss concepts in a collaborative manner.

Abstract

During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.

Paper Structure

This paper contains 69 sections, 12 figures.

Figures (12)

  • Figure 1: Design Space of Thing2Reality: with the rows of Scene-Level (SL) & Object-Level (OL) capturing, Scene-Level (SL) & Object-Level (OL) generating, and columns for representations of objects. Left rows of the table indicate the difference between capturing and generating. Right rows of the table indicate the difference between scene-level and object-level.
  • Figure 2: Human-human communication methods: 1) text or speech, 2) sketch, 3) images or videos can be used as input to achieve ideal 2D images via digital search, image/video capturing, or GenAI/ML models (text-to-image, sketch-to-image), which can then be converted to arbitrary segmented image, conditioned multiview renderings, and 3D Gaussian.
  • Figure 3: An example user journey: (a) a user begins by selecting preferred visuals to bring to reality. This is achieved by painting on the desired region within the web browser or camera feed of the physical space. These objects are subsequently processed through progressive stages: starting from a 2D segmented image, evolving into conditioned multi-view renderings, and ultimately, to a 3D Gaussian representation. (b) Meanwhile, orthogonal views are laid out along the rings of the Pie Menu. (c) The 3D Gaussians are summoned after 1-2 seconds. (d) The user can re-position and re-scale it via the Sphere Proxy.
  • Figure 4: 3D-to-2D Process: A user can capture snapshots from different perspectives of the 3D Gaussians, and project it on the whiteboard. (a) Third-person perspective; (b) first-person perspective.
  • Figure 5: With video see-through mode, users can bring physical objects and sketches to the shared space in XR.
  • ...and 7 more figures