Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

Keiichi Ihara; Kyzyl Monteiro; Mehrad Faridan; Rubaiat Habib Kazi; Ryo Suzuki

Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

Keiichi Ihara, Kyzyl Monteiro, Mehrad Faridan, Rubaiat Habib Kazi, Ryo Suzuki

TL;DR

Video2MR automatically converts 2D instructional videos into MR 3D avatars and augments them with four techniques—pose comparison, motion visualization, embodied temporal navigation, and avatar repositioning—to create engaging, scalable MR sports and exercise instructions. It leverages DeepMotion for motion extraction and uses Hololens2 with Azure Kinect to deliver real-time feedback and visualization, including a pose-match indicator and 3D gaze/trajectory cues. Through a formative prototype, a user study (n=12) and expert reviews (n=6) across multiple sports, the system demonstrates increased co-presence, engagement, and fun compared with 2D videos, while highlighting challenges in tracking accuracy, first-person visibility, and equipment integration. The work suggests concrete future enhancements in object interaction, verbal guidance, and richer avatar realism to further broaden applicability and effectiveness in automated MR instruction.

Abstract

This paper introduces Video2MR, a mixed reality system that automatically generates 3D sports and exercise instructions from 2D videos. Mixed reality instructions have great potential for physical training, but existing works require substantial time and cost to create these 3D experiences. Video2MR overcomes this limitation by transforming arbitrary instructional videos available online into MR 3D avatars with AI-enabled motion capture (DeepMotion). Then, it automatically enhances the avatar motion through the following augmentation techniques: 1) contrasting and highlighting differences between the user and avatar postures, 2) visualizing key trajectories and movements of specific body parts, 3) manipulation of time and speed using body motion, and 4) spatially repositioning avatars for different perspectives. Developed on Hololens 2 and Azure Kinect, we showcase various use cases, including yoga, dancing, soccer, tennis, and other physical exercises. The study results confirm that Video2MR provides more engaging and playful learning experiences, compared to existing 2D video instructions.

Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

TL;DR

Abstract

Paper Structure (56 sections, 16 figures)

This paper contains 56 sections, 16 figures.

Introduction
Related Work
Usage of 3D Avatars
Features in Instructional Systems
Experience Prototyping
Initial Prototype
Formative Study Protocol
Benefits
Challenges and Needs
Posture Comparison: Needs to Easily Compare between User's and Instructor's Motion
Focusing and Highlighting: Difficulty in Tracking Specific Body Parts
Temporal Navigation: Challenges in Controlling Speed and Time of the Avatar Instruction
Spatial Reposition: Needs of Seamlessly Switching between First- and Third-Person Perspectives
Video2MR: Augmenting Auto Generated MR Instructions
Posture Comparison
...and 41 more sections

Figures (16)

Figure 1: Formative Study
Figure 2: Design Space of Video2MR
Figure 3: System Overview
Figure 4: Indicator
Figure 5: Pose Match Score
...and 11 more figures

Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

TL;DR

Abstract

Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (16)