Leveraging Multimodal LLM for Inspirational User Interface Search
Seokhyeon Park, Yumin Song, Soohyun Lee, Jaeyoung Kim, Jinwook Seo
TL;DR
The paper tackles the challenge of semantically rich inspirational UI search by eliminating reliance on metadata and pixel similarity. It introduces a pipeline that uses a multimodal large language model to extract rich UI semantics directly from mobile UI screenshots and assembles them into a semantic-based retrieval system called S&UI. Through computational evaluations on UI datasets and extensive human studies with designers, the authors demonstrate that semantic extraction plus S&UI outperforms traditional pixel-/metadata-based baselines in relevance, reliability, usefulness, diversity, and serendipity. The work advances UI design tooling by enabling context-aware, explainable inspiration and provides a public S&UI dataset to accelerate future research in semantic UI understanding and retrieval.
Abstract
Inspirational search, the process of exploring designs to inform and inspire new creative work, is pivotal in mobile user interface (UI) design. However, exploring the vast space of UI references remains a challenge. Existing AI-based UI search methods often miss crucial semantics like target users or the mood of apps. Additionally, these models typically require metadata like view hierarchies, limiting their practical use. We used a multimodal large language model (MLLM) to extract and interpret semantics from mobile UI images. We identified key UI semantics through a formative study and developed a semantic-based UI search system. Through computational and human evaluations, we demonstrate that our approach significantly outperforms existing UI retrieval methods, offering UI designers a more enriched and contextually relevant search experience. We enhance the understanding of mobile UI design semantics and highlight MLLMs' potential in inspirational search, providing a rich dataset of UI semantics for future studies.
