Table of Contents
Fetching ...

MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments

Yuqi Tong, Yue Qiu, Ruiyang Li, Shi Qiu, Pheng-Ann Heng

TL;DR

The paper addresses the challenge of rapid, user-friendly 3D content creation for XR by fusing hand-drawn sketches with voice prompts. It introduces MS2Mesh-XR, a multi-modal pipeline that uses ControlNet for 2D image inference and a Convolutional Reconstruction (CRM) approach to reconstruct textured 3D meshes from six orthographic views, enabling run-time XR visualization in under 20 seconds. Key contributions include a seamless multi-modal input interface, a ControlNet-based image inference module, and a CRM-driven mesh reconstruction that yields high-fidelity textured meshes suitable for VR and MR use cases. The approach supports interactive asset creation in VR and interior design in MR, facilitating intuitive design workflows and immersive experiences in XR environments.

Abstract

We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we devise ControlNet to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. We demonstrate the practicability of our pipeline through two use cases in XR settings. By leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences. Our code and demo will be available at: https://yueqiu0911.github.io/MS2Mesh-XR/

MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments

TL;DR

The paper addresses the challenge of rapid, user-friendly 3D content creation for XR by fusing hand-drawn sketches with voice prompts. It introduces MS2Mesh-XR, a multi-modal pipeline that uses ControlNet for 2D image inference and a Convolutional Reconstruction (CRM) approach to reconstruct textured 3D meshes from six orthographic views, enabling run-time XR visualization in under 20 seconds. Key contributions include a seamless multi-modal input interface, a ControlNet-based image inference module, and a CRM-driven mesh reconstruction that yields high-fidelity textured meshes suitable for VR and MR use cases. The approach supports interactive asset creation in VR and interior design in MR, facilitating intuitive design workflows and immersive experiences in XR environments.

Abstract

We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we devise ControlNet to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. We demonstrate the practicability of our pipeline through two use cases in XR settings. By leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences. Our code and demo will be available at: https://yueqiu0911.github.io/MS2Mesh-XR/

Paper Structure

This paper contains 14 sections, 3 figures.

Figures (3)

  • Figure 1: MS2Mesh-XR integrates hand-drawn sketches with voice inputs to rapidly generate realistic 3D meshes for natural user interactions in XR environments.
  • Figure 2: Overview for our MS2Mesh-XR pipeline. Multi-modal inputs from user sketch and voice feed into the image inference module, which generates a reference image. The mesh reconstruction module then uses the reference image to reconstruct a corresponding 3D mesh, leveraging multiview images generated by the diffusion models. The generated 3D object is finally rendered in the XR environment supported by intuitive user interactions.
  • Figure 3: Comparison of images and meshes generated by sketches with different prompts. For each pair of generation results, the left one shows the generated 2D image, while the right one presents the corresponding generated 3D mesh.