Table of Contents
Fetching ...

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

TL;DR

This work targets the accessibility gap caused by missing hint-text in Android text inputs, which impedes screen-reader users. It introduces HintDroid, an LLM-based system that analyzes GUI context, uses in-context learning with retrieved examples, and applies a feedback-driven refinement loop to generate meaningful hint-text and corresponding input content. Through large-scale motivation studies, it shows a high prevalence of missing hint-text and validates performance on 2,659 inputs across 753 apps, achieving strong scores on BLEU, METEOR, ROUGE, CIDEr, and exact-match metrics, along with a substantial user-study showing improved input accuracy and exploration efficiency. The results indicate HintDroid can meaningfully improve accessibility and usability for visually impaired users and may generalize to other platforms and development workflows, with future work focusing on personalization, real-time integration, and broader end-user applications.

Abstract

Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI.

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

TL;DR

This work targets the accessibility gap caused by missing hint-text in Android text inputs, which impedes screen-reader users. It introduces HintDroid, an LLM-based system that analyzes GUI context, uses in-context learning with retrieved examples, and applies a feedback-driven refinement loop to generate meaningful hint-text and corresponding input content. Through large-scale motivation studies, it shows a high prevalence of missing hint-text and validates performance on 2,659 inputs across 753 apps, achieving strong scores on BLEU, METEOR, ROUGE, CIDEr, and exact-match metrics, along with a substantial user-study showing improved input accuracy and exploration efficiency. The results indicate HintDroid can meaningfully improve accessibility and usability for visually impaired users and may generalize to other platforms and development workflows, with future work focusing on personalization, real-time integration, and broader end-user applications.

Abstract

Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in. Unfortunately, based on our analysis of 4,501 Android apps with text inputs, over 0.76 of them are missing hint-text. These issues are mostly caused by developers' lack of awareness when considering visually impaired individuals. To overcome these challenges, we developed an LLM-based hint-text generation model called HintDroid, which analyzes the GUI information of input components and uses in-context learning to generate the hint-text. To ensure the quality of hint-text generation, we further designed a feedback-based inspection mechanism to further adjust hint-text. The automated experiments demonstrate the high BLEU and a user study further confirms its usefulness. HintDroid can not only help visually impaired individuals, but also help ordinary people understand the requirements of input components. HintDroid demo video: https://youtu.be/FWgfcctRbfI.
Paper Structure (47 sections, 12 figures, 4 tables)

This paper contains 47 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Examples of differences between hint-text, label and context description. (a) Label is used to briefly describe image components. (b) Content description provides an overview of related input components. (c) Hint-text further explains the input requirements.
  • Figure 2: Workflow of our HintDroid: It extracts GUI entity information from the view hierarchy file of the GUI page and constructs a GUI prompt that helps LLM understand the context. To facilitate LLM's better understanding of the task, HintDroid uses a retrieval-based example selection method to construct the in-context learning prompts. It also uses input content as a bridge to evaluate the generated hint-text and extracts feedback information by checking whether the input content can trigger the next GUI page.
  • Figure 3: Example of the text input component without/with hint-text.(a) These inputs have issues of missing hint-text. (b) Hint-text lacks practical meaning. (c) Hint-text can help visually impaired users successfully fill in the correct input.
  • Figure 4: Statistical results of hint-text missing rate. More than 80% of the hint-text of 18 categories of apps are missing.
  • Figure 5: Overview of HintDroid. HintDroid consists of three main modules: (1) Module 1 is used to extract the contextual GUI information of the text input and generate the GUI prompt. (2) Module 2 is used to construct the in-context learning prompt to improve the performance of LLM. (3) Module 3 further optimizes the generation results of hint-text through a feedback mechanism.
  • ...and 7 more figures