Table of Contents
Fetching ...

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

TL;DR

This work introduces MUD, a large-scale, modern mobile UI dataset mined from recent Android apps using LLM-guided automated exploration guided by view hierarchy context. It combines noise filtering and human validation to deliver 18k high-quality UIs from 3.3k apps across 33 categories, addressing noise and outdated design issues in legacy datasets like Rico. The authors demonstrate MUD’s value through improved UI element detection and UI retrieval, achieving higher performance and richer design representations than prior datasets. The study also discusses limitations and future directions, including expanding modalities, cross-platform collection, and broader UI understanding tasks, with the ultimate goal of enabling a more intelligent UI agent for modern UIs.

Abstract

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

TL;DR

This work introduces MUD, a large-scale, modern mobile UI dataset mined from recent Android apps using LLM-guided automated exploration guided by view hierarchy context. It combines noise filtering and human validation to deliver 18k high-quality UIs from 3.3k apps across 33 categories, addressing noise and outdated design issues in legacy datasets like Rico. The authors demonstrate MUD’s value through improved UI element detection and UI retrieval, achieving higher performance and richer design representations than prior datasets. The study also discusses limitations and future directions, including expanding modalities, cross-platform collection, and broader UI understanding tasks, with the ultimate goal of enabling a more intelligent UI agent for modern UIs.

Abstract

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.
Paper Structure (25 sections, 10 figures, 2 tables)

This paper contains 25 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Examples of four types of updates in the new UI design, including visual appearance, visual hierarchy, user interaction, and typography.
  • Figure 2: Examples of noises in the Rico dataset. The orange bounding box presents the elements in the view hierarchy.
  • Figure 3: The overview of our dataset collection process.
  • Figure 4: Illustration of our LLMs. We prompt the model to suggest potential interactions, based on the current UI view hierarchy, in order to achieve maximum coverage.
  • Figure 5: The performance of our LLMs-enhanced exploration method compared to three automated exploration tools, including Monkey, Droidbot, and Humanoid, and two ablation studies, including the prompt without role instantiation and action primitives.
  • ...and 5 more figures