Table of Contents
Fetching ...

Understanding the Challenges and Promises of Developing Generative AI Apps: An Empirical Study

Buthayna AlMulla, Maram Assi, Safwat Hassan

TL;DR

The paper tackles understanding how end users perceive Gen-AI mobile apps by analyzing a large-scale corpus of Google Play reviews. It introduces SARA, a four-phase workflow (Selection, Acquisition, Refinement, Analysis) that leverages prompt-based LLMs to extract and assign topics at scale, achieving 91% accuracy. The study identifies the top 10 user-facing topics and analyzes their temporal evolution, revealing shifting expectations as Gen-AI mature. The work offers actionable guidance for developers, platform owners, testers, and policymakers, and provides a replication package to support future cross-platform analyses.

Abstract

The release of ChatGPT in 2022 triggered a rapid surge in generative artificial intelligence mobile apps (i.e., Gen-AI apps). Despite widespread adoption, little is known about how end users perceive and evaluate these Gen-AI functionalities in practice. In this work, we conduct a user-centered analysis of 676,066 automatically labeled reviews from 173 Gen-AI apps on the Google Play Store. We propose a structured four-phase framework, SARA (Selection, Acquisition, Refinement, and Analysis), which integrates and extends state-of-the-art techniques for large-scale review collection, filtering, and analysis using prompt-based LLMs. First, we empirically validate the reliability of LLMs for topic extraction and assignment, achieving 91% accuracy through five-shot prompting and LLM-based filtering of non-informative reviews. We then apply the framework to informative reviews to identify the ten most discussed topics (e.g., AI Performance, Content Quality, and Content Policy & Censorship) and analyze the key challenges and emerging opportunities. Finally, we examine how these topics evolve over time, offering insight into shifting user expectations and engagement patterns with Gen-AI apps. Based on our findings and observations, we present actionable implications for developers and researchers.

Understanding the Challenges and Promises of Developing Generative AI Apps: An Empirical Study

TL;DR

The paper tackles understanding how end users perceive Gen-AI mobile apps by analyzing a large-scale corpus of Google Play reviews. It introduces SARA, a four-phase workflow (Selection, Acquisition, Refinement, Analysis) that leverages prompt-based LLMs to extract and assign topics at scale, achieving 91% accuracy. The study identifies the top 10 user-facing topics and analyzes their temporal evolution, revealing shifting expectations as Gen-AI mature. The work offers actionable guidance for developers, platform owners, testers, and policymakers, and provides a replication package to support future cross-platform analyses.

Abstract

The release of ChatGPT in 2022 triggered a rapid surge in generative artificial intelligence mobile apps (i.e., Gen-AI apps). Despite widespread adoption, little is known about how end users perceive and evaluate these Gen-AI functionalities in practice. In this work, we conduct a user-centered analysis of 676,066 automatically labeled reviews from 173 Gen-AI apps on the Google Play Store. We propose a structured four-phase framework, SARA (Selection, Acquisition, Refinement, and Analysis), which integrates and extends state-of-the-art techniques for large-scale review collection, filtering, and analysis using prompt-based LLMs. First, we empirically validate the reliability of LLMs for topic extraction and assignment, achieving 91% accuracy through five-shot prompting and LLM-based filtering of non-informative reviews. We then apply the framework to informative reviews to identify the ten most discussed topics (e.g., AI Performance, Content Quality, and Content Policy & Censorship) and analyze the key challenges and emerging opportunities. Finally, we examine how these topics evolve over time, offering insight into shifting user expectations and engagement patterns with Gen-AI apps. Based on our findings and observations, we present actionable implications for developers and researchers.

Paper Structure

This paper contains 21 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Examples of reviews of Gen-AI apps