Table of Contents
Fetching ...

Towards Enhanced Context Awareness with Vision-based Multimodal Interfaces

Yongquan Hu, Wen Hu, Aaron Quigley

TL;DR

This work targets enhanced context awareness in Human-Computer Interaction by combining Vision-based Interfaces with multimodal AI to jointly interpret user intent and surrounding environment. It proposes a three-dimension framework—Scale, Space, and Time—implemented through three application cases: microscopic surface sensing, depth-aware spatial projection, and temporally aware haptic feedback in virtual environments. Progress includes published work on the scale application (MicroCam) and multiple related submissions (space and time) to leading venues, alongside a defined methodology and evaluation plan for cross-modal integration. The expected contributions span novel VMIs artifacts, a survey taxonomy, and empirical studies guiding design, with practical impact on adaptive interactive systems across physical and virtual realms.

Abstract

Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalities, offer a robust solution for effectively capturing and interpreting user intentions and complex environmental information, thereby facilitating seamless and efficient interactions. This PhD study explores three application cases of multimodal interfaces to augment context awareness, respectively focusing on three dimensions of visual modality: scale, depth, and time: a fine-grained analysis of physical surfaces via microscopic image, precise projection of the real world using depth data, and rendering haptic feedback from video background in virtual environments.

Towards Enhanced Context Awareness with Vision-based Multimodal Interfaces

TL;DR

This work targets enhanced context awareness in Human-Computer Interaction by combining Vision-based Interfaces with multimodal AI to jointly interpret user intent and surrounding environment. It proposes a three-dimension framework—Scale, Space, and Time—implemented through three application cases: microscopic surface sensing, depth-aware spatial projection, and temporally aware haptic feedback in virtual environments. Progress includes published work on the scale application (MicroCam) and multiple related submissions (space and time) to leading venues, alongside a defined methodology and evaluation plan for cross-modal integration. The expected contributions span novel VMIs artifacts, a survey taxonomy, and empirical studies guiding design, with practical impact on adaptive interactive systems across physical and virtual realms.

Abstract

Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalities, offer a robust solution for effectively capturing and interpreting user intentions and complex environmental information, thereby facilitating seamless and efficient interactions. This PhD study explores three application cases of multimodal interfaces to augment context awareness, respectively focusing on three dimensions of visual modality: scale, depth, and time: a fine-grained analysis of physical surfaces via microscopic image, precise projection of the real world using depth data, and rendering haptic feedback from video background in virtual environments.
Paper Structure (7 sections, 1 figure, 1 table)

This paper contains 7 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The schematic representation of the MicroCam system pipeline.