Table of Contents
Fetching ...

MANA: Towards Efficient Mobile Ad Detection via Multimodal Agentic UI Navigation

Yizhe Zhao, Yongjian Fu, Zihao Feng, Hao Pan, Yongheng Deng, Yaoxue Zhang, Ju Ren

Abstract

Mobile advertising dominates app monetization but introduces risks ranging from intrusive user experience to malware delivery. Existing detection methods rely either on static analysis, which misses runtime behaviors, or on heuristic UI exploration, which struggles with sparse and obfuscated ads. In this paper, we present MANA, the first agentic multimodal reasoning framework for mobile ad detection. MANA integrates static, visual, temporal, and experiential signals into a reasoning-guided navigation strategy that determines not only how to traverse interfaces but also where to focus, enabling efficient and robust exploration. We implement and evaluate MANA on commercial smartphones over 200 apps, achieving state-of-the-art accuracy and efficiency. Compared to baselines, it improves detection accuracy by 30.5%-56.3% and reduces exploration steps by 29.7%-63.3%. Case studies further demonstrate its ability to uncover obfuscated and malicious ads, underscoring its practicality for mobile ad auditing and its potential for broader runtime UI analysis (e.g., permission abuse). Code and dataset are available at https://github.com/MANA-2026/MANA.

MANA: Towards Efficient Mobile Ad Detection via Multimodal Agentic UI Navigation

Abstract

Mobile advertising dominates app monetization but introduces risks ranging from intrusive user experience to malware delivery. Existing detection methods rely either on static analysis, which misses runtime behaviors, or on heuristic UI exploration, which struggles with sparse and obfuscated ads. In this paper, we present MANA, the first agentic multimodal reasoning framework for mobile ad detection. MANA integrates static, visual, temporal, and experiential signals into a reasoning-guided navigation strategy that determines not only how to traverse interfaces but also where to focus, enabling efficient and robust exploration. We implement and evaluate MANA on commercial smartphones over 200 apps, achieving state-of-the-art accuracy and efficiency. Compared to baselines, it improves detection accuracy by 30.5%-56.3% and reduces exploration steps by 29.7%-63.3%. Case studies further demonstrate its ability to uncover obfuscated and malicious ads, underscoring its practicality for mobile ad auditing and its potential for broader runtime UI analysis (e.g., permission abuse). Code and dataset are available at https://github.com/MANA-2026/MANA.
Paper Structure (46 sections, 2 theorems, 12 equations, 25 figures)

This paper contains 46 sections, 2 theorems, 12 equations, 25 figures.

Key Result

lemma 1

Escaping a structural loop that is absorbing under $\pi_{base}$ strictly increases the set of reachable latent states and ad triggers under a finite interaction budget.

Figures (25)

  • Figure 1: MANA as a multimodal reasoning agent for efficient mobile ad detection, fusing heterogeneous signals to jointly reason about "where and how to go".
  • Figure 2: Examples of different ad types: (a) integrated within the app interface as a banner; (b) shown as an intrusive interstitial that blocks normal interaction; (c) implemented as a "More Apps" list without standard SDK patterns, requiring semantic reasoning to recognize as advertising.
  • Figure 3: Screenshots of two example apps. The middle panels show the view-hierarchy nodes corresponding to the blue-boxed components in the screenshots, while red boxes highlight ad-related elements.
  • Figure 4: The system overview of MANA.
  • Figure 5: Distribution of different evidence.
  • ...and 20 more figures

Theorems & Definitions (2)

  • lemma 1: Loop Escape Implies Coverage Expansion
  • theorem 1: Resolution of Structural Loops