Table of Contents
Fetching ...

Visualizing the Invisible: A Generative AR System for Intuitive Multi-Modal Sensor Data Presentation

Yunqi Guo, Kaiyuan Hou, Heming Fu, Hongkai Chen, Zhenyu Yan, Guoliang Xing, Xiaofan Jiang

TL;DR

Vivar introduces a cross-modal embedding framework that maps multi-modal sensor data into a unified visual space using barycentric interpolation, enabling intuitive AR visualizations. It couples this embedding with a sensor-aware production pipeline that employs 3D Gaussian Splatting and foundation models to generate coherent, volumetric sensor representations, while latent reuse accelerates generation by $11\times$. A comprehensive user study with over 500 participants, including domain experts, demonstrates gains in visualization coherence, faithfulness, and practicality across real-world tasks. The approach advances sensor-data visualization by providing smooth transitions between readings, robust multi-sensor handling, and end-to-end AR generation, with open-source aims to encourage adoption and further research.

Abstract

Understanding sensor data can be difficult for non-experts because of the complexity and different semantic meanings of sensor modalities. This leads to a need for intuitive and effective methods to present sensor information. However, creating intuitive sensor data visualizations presents three key challenges: the variability of sensor readings, gaps in domain comprehension, and the dynamic nature of sensor data. To address these issues, we propose Vivar, a novel system that integrates multi-modal sensor data and presents 3D volumetric content for AR visualization. In particular, we introduce a cross-modal embedding approach that maps sensor data into a pre-trained visual embedding space through barycentric interpolation. This approach accurately reflects value changes in multi-modal sensor information, ensuring that sensor variations are properly shown in visualization outcomes. Vivar also incorporates sensor-aware AR scene generation using foundation models and 3D Gaussian Splatting (3DGS) without requiring domain expertise. In addition, Vivar leverages latent reuse and caching strategies to accelerate 2D and AR content generation, demonstrating 11x latency reduction without compromising quality. A user study involving over 503 participants, including domain experts, demonstrates Vivar's effectiveness in accuracy, consistency, and real-world applicability, paving the way for more intuitive sensor data visualization.

Visualizing the Invisible: A Generative AR System for Intuitive Multi-Modal Sensor Data Presentation

TL;DR

Vivar introduces a cross-modal embedding framework that maps multi-modal sensor data into a unified visual space using barycentric interpolation, enabling intuitive AR visualizations. It couples this embedding with a sensor-aware production pipeline that employs 3D Gaussian Splatting and foundation models to generate coherent, volumetric sensor representations, while latent reuse accelerates generation by . A comprehensive user study with over 500 participants, including domain experts, demonstrates gains in visualization coherence, faithfulness, and practicality across real-world tasks. The approach advances sensor-data visualization by providing smooth transitions between readings, robust multi-sensor handling, and end-to-end AR generation, with open-source aims to encourage adoption and further research.

Abstract

Understanding sensor data can be difficult for non-experts because of the complexity and different semantic meanings of sensor modalities. This leads to a need for intuitive and effective methods to present sensor information. However, creating intuitive sensor data visualizations presents three key challenges: the variability of sensor readings, gaps in domain comprehension, and the dynamic nature of sensor data. To address these issues, we propose Vivar, a novel system that integrates multi-modal sensor data and presents 3D volumetric content for AR visualization. In particular, we introduce a cross-modal embedding approach that maps sensor data into a pre-trained visual embedding space through barycentric interpolation. This approach accurately reflects value changes in multi-modal sensor information, ensuring that sensor variations are properly shown in visualization outcomes. Vivar also incorporates sensor-aware AR scene generation using foundation models and 3D Gaussian Splatting (3DGS) without requiring domain expertise. In addition, Vivar leverages latent reuse and caching strategies to accelerate 2D and AR content generation, demonstrating 11x latency reduction without compromising quality. A user study involving over 503 participants, including domain experts, demonstrates Vivar's effectiveness in accuracy, consistency, and real-world applicability, paving the way for more intuitive sensor data visualization.

Paper Structure

This paper contains 43 sections, 3 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Left: without Vivar, sensor data are abstract and hard to be interpreted by non-experts. Right: Vivar generates AR scenes with intuitive and immersive sensor data presentation.
  • Figure 2: Survey About Sensor Data Interpretation
  • Figure 3: Relationship between cosine similarity and numerical differences for three multimodal encoders. All three models demonstrate cognitive biases or limitations when embedding quantitative data.
  • Figure 4: Overview of Vivar Workflow.
  • Figure 5: Anchors for embedding sensor readings.
  • ...and 9 more figures