Towards Enhanced Context Awareness with Vision-based Multimodal Interfaces
Yongquan Hu, Wen Hu, Aaron Quigley
TL;DR
This work targets enhanced context awareness in Human-Computer Interaction by combining Vision-based Interfaces with multimodal AI to jointly interpret user intent and surrounding environment. It proposes a three-dimension framework—Scale, Space, and Time—implemented through three application cases: microscopic surface sensing, depth-aware spatial projection, and temporally aware haptic feedback in virtual environments. Progress includes published work on the scale application (MicroCam) and multiple related submissions (space and time) to leading venues, alongside a defined methodology and evaluation plan for cross-modal integration. The expected contributions span novel VMIs artifacts, a survey taxonomy, and empirical studies guiding design, with practical impact on adaptive interactive systems across physical and virtual realms.
Abstract
Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalities, offer a robust solution for effectively capturing and interpreting user intentions and complex environmental information, thereby facilitating seamless and efficient interactions. This PhD study explores three application cases of multimodal interfaces to augment context awareness, respectively focusing on three dimensions of visual modality: scale, depth, and time: a fine-grained analysis of physical surfaces via microscopic image, precise projection of the real world using depth data, and rendering haptic feedback from video background in virtual environments.
