Table of Contents
Fetching ...

MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions

Rahul Islam, Vasco Xu, Karan Ahuja

TL;DR

MotionTrace targets bandwidth-limited smartphone AR by predicting future field of view using only the device's IMU. It trains a two-layer Bidirectional LSTM with exogenous inputs on synthetic AMASS-derived IMU data and real Pose-on-the-Go data to forecast 3D hand/phone position across 50–800 ms horizons, achieving average errors from 0.11 to 143.62 mm. The findings show that roughly 3 seconds of historical IMU data suffices and that prediction accuracy decreases with longer horizons, with the strongest results in the 50–400 ms range. This IMU-based approach offers a low-power, camera-free complement to existing FOV-prediction pipelines, enabling more efficient AR streaming and reduced latency in mobile devices.

Abstract

For handheld smartphone AR interactions, bandwidth is a critical constraint. Streaming techniques have been developed to provide a seamless and high-quality user experience despite these challenges. To optimize streaming performance in smartphone-based AR, accurate prediction of the user's field of view is essential. This prediction allows the system to prioritize loading digital content that the user is likely to engage with, enhancing the overall interactivity and immersion of the AR experience. In this paper, we present MotionTrace, a method for predicting the user's field of view using a smartphone's inertial sensor. This method continuously estimates the user's hand position in 3D-space to localize the phone position. We evaluated MotionTrace over future hand positions at 50, 100, 200, 400, and 800ms time horizons using the large motion capture (AMASS) and smartphone-based full-body pose estimation (Pose-on-the-Go) datasets. We found that our method can estimate the future phone position of the user with an average MSE between 0.11 - 143.62 mm across different time horizons.

MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions

TL;DR

MotionTrace targets bandwidth-limited smartphone AR by predicting future field of view using only the device's IMU. It trains a two-layer Bidirectional LSTM with exogenous inputs on synthetic AMASS-derived IMU data and real Pose-on-the-Go data to forecast 3D hand/phone position across 50–800 ms horizons, achieving average errors from 0.11 to 143.62 mm. The findings show that roughly 3 seconds of historical IMU data suffices and that prediction accuracy decreases with longer horizons, with the strongest results in the 50–400 ms range. This IMU-based approach offers a low-power, camera-free complement to existing FOV-prediction pipelines, enabling more efficient AR streaming and reduced latency in mobile devices.

Abstract

For handheld smartphone AR interactions, bandwidth is a critical constraint. Streaming techniques have been developed to provide a seamless and high-quality user experience despite these challenges. To optimize streaming performance in smartphone-based AR, accurate prediction of the user's field of view is essential. This prediction allows the system to prioritize loading digital content that the user is likely to engage with, enhancing the overall interactivity and immersion of the AR experience. In this paper, we present MotionTrace, a method for predicting the user's field of view using a smartphone's inertial sensor. This method continuously estimates the user's hand position in 3D-space to localize the phone position. We evaluated MotionTrace over future hand positions at 50, 100, 200, 400, and 800ms time horizons using the large motion capture (AMASS) and smartphone-based full-body pose estimation (Pose-on-the-Go) datasets. We found that our method can estimate the future phone position of the user with an average MSE between 0.11 - 143.62 mm across different time horizons.
Paper Structure (12 sections, 2 figures, 1 table)

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Samples of predictions by our model at 50, 100, 200, 400, and 800 ms.
  • Figure 2: Comparison of average MAE across different datasets and time horizons.