Table of Contents
Fetching ...

uxSense: Supporting User Experience Analysis with Visualization and Computer Vision

Andrea Batch, Yipeng Ji, Mingming Fan, Jian Zhao, Niklas Elmqvist

TL;DR

uxSense tackles the scalability challenge of UX evaluation by providing a visual analytics framework that converts video and audio from usability sessions into multi-modal, time-stamped data streams. It uses a plugin-based architecture to extract streams such as speech, gaze, gestures, and facial expressions, which are synchronized into parallel timelines for exploration, annotation, and reporting. An expert UX designer study demonstrates the system's potential to streamline sensemaking while revealing design improvements and ethical considerations for video-based analysis. The work suggests significant practical impact by enabling richer, scalable UX insights and outlines a clear path toward broader, real-time, and immersive applications.

Abstract

Analyzing user behavior from usability evaluation can be a challenging and time-consuming task, especially as the number of participants and the scale and complexity of the evaluation grows. We propose uxSense, a visual analytics system using machine learning methods to extract user behavior from audio and video recordings as parallel time-stamped data streams. Our implementation draws on pattern recognition, computer vision, natural language processing, and machine learning to extract user sentiment, actions, posture, spoken words, and other features from such recordings. These streams are visualized as parallel timelines in a web-based front-end, enabling the researcher to search, filter, and annotate data across time and space. We present the results of a user study involving professional UX researchers evaluating user data using uxSense. In fact, we used uxSense itself to evaluate their sessions.

uxSense: Supporting User Experience Analysis with Visualization and Computer Vision

TL;DR

uxSense tackles the scalability challenge of UX evaluation by providing a visual analytics framework that converts video and audio from usability sessions into multi-modal, time-stamped data streams. It uses a plugin-based architecture to extract streams such as speech, gaze, gestures, and facial expressions, which are synchronized into parallel timelines for exploration, annotation, and reporting. An expert UX designer study demonstrates the system's potential to streamline sensemaking while revealing design improvements and ethical considerations for video-based analysis. The work suggests significant practical impact by enabling richer, scalable UX insights and outlines a clear path toward broader, real-time, and immersive applications.

Abstract

Analyzing user behavior from usability evaluation can be a challenging and time-consuming task, especially as the number of participants and the scale and complexity of the evaluation grows. We propose uxSense, a visual analytics system using machine learning methods to extract user behavior from audio and video recordings as parallel time-stamped data streams. Our implementation draws on pattern recognition, computer vision, natural language processing, and machine learning to extract user sentiment, actions, posture, spoken words, and other features from such recordings. These streams are visualized as parallel timelines in a web-based front-end, enabling the researcher to search, filter, and annotate data across time and space. We present the results of a user study involving professional UX researchers evaluating user data using uxSense. In fact, we used uxSense itself to evaluate their sessions.
Paper Structure (32 sections, 8 figures, 2 tables)

This paper contains 32 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Schematic overview. Analysis interface in the uxSense web-based client (view reflects tutorial video). a)Video playback: View user session video, with or without captions. b)Session transcript: View timestamped transcript of speech from video, and navigate video by clicking on line of text. c)User annotation Table: View the text and timestamp of all annotations made by the user. d)Zoom focus: Select, zoom, and pan whole extent of the video. Red arrow marker indicates current video time, while brushed region shows zoom extent in context of video duration. e)Categorical filters: When selected, non-selected elements of the view are shown with low opacity. f)Details-on-demand: Mouseover to get details of observation in model output at given time. g)Point annotation and i) Interval annotation: Add an annotation corresponding to given timeline for either the video's current time (g) or the brushed interval range (i) with (d). h)Model output timeline viewer: The timeline and user annotations are described in Section \ref{['sec:datamodel']}.
  • Figure 2: Timeline View. This view acts as both a video scrubber to select current time and an interactive representation of the filter output. Mousing over an element shows text details of the timeline at the current frame. Filtering using the checkboxes on the header highlights all observations meeting the filter criteria by making all other observations semi-transparent. Clicking on the timeline navigates the video to the selected timestamp.
  • Figure 3: Focus-brushing. This feature selects an interval of the video, zooms all timelines, and allows the user to drag either the timelines or the selected rectangle to pan through the data.
  • Figure 4: Annotlette example. An annotlette generated by uxSense during our own evaluation of user sessions in which UX professionals used uxSense. a) Metadata regarding the annotation. b) Transcript from the period selected for annotation. c) The annotation text created by the user. d) A zoomed view of the timeline associated with the annotation for the selected period.
  • Figure 5: Evaluation pipeline. Workflow in the uxSense prototype system for extracting user behavior from video footage using deep learning to support in-depth and advanced analysis of participant performance in user studies. We use uxSense to evaluate user sessions in which professional UX researchers use uxSense to evaluate a sample Tableau user session. a) sample user session with commercial visual analytics tool (Tableau Public); b) sample user session video and audio streams; c) server compute of sample user session data: Video pose-estimate-based temporal segmentation, video emotion and action classification, speech detection and audio signal processing; d) evaluation user sessions with professional UX researchers using uxSense interface with sample session video, model output; e) UX researcher user session video and audio streams; f) server compute of video and audio models using professional UX researcher session data; g) authors’ evaluation of professional UX researcher user sessions using uxSense.
  • ...and 3 more figures