Does GenAI Make Usability Testing Obsolete?
Ali Ebrahimi Pourasad, Walid Maalej
TL;DR
The paper investigates whether GenAI can supplant usability testing for mobile apps by introducing UX-LLM, a vision-enabled tool that predicts usability issues from iOS app context, SwiftUI code, and view images. Through a multi-method study comparing UX-LLM against traditional usability testing and expert reviews on two open-source apps, UX-LLM shows valid issue detection with precision around $0.61$–$0.66$ but limited recall around $0.35$–$0.38$, indicating it complements rather than replaces human evaluation. A focus group with a capstone development team reveals generally positive reception and practical integration concerns, suggesting the need for workflow-aware deployment, e.g., IDE or CI integration. The findings advocate a hybrid approach that leverages GenAI to augment, not replace, UX expertise, particularly for small teams or niche scenarios, while underscoring the value of data interoperability and cross-method triangulation for robust usability practice.
Abstract
Ensuring usability is crucial for the success of mobile apps. Usability issues can compromise user experience and negatively impact the perceived app quality. This paper presents UX-LLM, a novel tool powered by a Large Vision-Language Model that predicts usability issues in iOS apps. To evaluate the performance of UX-LLM, we predicted usability issues in two open-source apps of a medium complexity and asked two usability experts to assess the predictions. We also performed traditional usability testing and expert review for both apps and compared the results to those of UX-LLM. UX-LLM demonstrated precision ranging from 0.61 and 0.66 and recall between 0.35 and 0.38, indicating its ability to identify valid usability issues, yet failing to capture the majority of issues. Finally, we conducted a focus group with an app development team of a capstone project developing a transit app for visually impaired persons. The focus group expressed positive perceptions of UX-LLM as it identified unknown usability issues in their app. However, they also raised concerns about its integration into the development workflow, suggesting potential improvements. Our results show that UX-LLM cannot fully replace traditional usability evaluation methods but serves as a valuable supplement particularly for small teams with limited resources, to identify issues in less common user paths, due to its ability to inspect the source code.
