uxSense: Supporting User Experience Analysis with Visualization and Computer Vision
Andrea Batch, Yipeng Ji, Mingming Fan, Jian Zhao, Niklas Elmqvist
TL;DR
uxSense tackles the scalability challenge of UX evaluation by providing a visual analytics framework that converts video and audio from usability sessions into multi-modal, time-stamped data streams. It uses a plugin-based architecture to extract streams such as speech, gaze, gestures, and facial expressions, which are synchronized into parallel timelines for exploration, annotation, and reporting. An expert UX designer study demonstrates the system's potential to streamline sensemaking while revealing design improvements and ethical considerations for video-based analysis. The work suggests significant practical impact by enabling richer, scalable UX insights and outlines a clear path toward broader, real-time, and immersive applications.
Abstract
Analyzing user behavior from usability evaluation can be a challenging and time-consuming task, especially as the number of participants and the scale and complexity of the evaluation grows. We propose uxSense, a visual analytics system using machine learning methods to extract user behavior from audio and video recordings as parallel time-stamped data streams. Our implementation draws on pattern recognition, computer vision, natural language processing, and machine learning to extract user sentiment, actions, posture, spoken words, and other features from such recordings. These streams are visualized as parallel timelines in a web-based front-end, enabling the researcher to search, filter, and annotate data across time and space. We present the results of a user study involving professional UX researchers evaluating user data using uxSense. In fact, we used uxSense itself to evaluate their sessions.
