On the Robustness of GUI Grounding Models Against Image Attacks
Haoren Zhao, Tianyi Chen, Zhen Wang
TL;DR
This work probes the robustness of GUI-grounding models against natural noise and adversarial perturbations, a critical concern for reliable GUI agents. It systematically evaluates state-of-the-art models (e.g., UGround, SeeClick, OS-Atlas-Base-7B) across mobile, desktop, and web interfaces under three threat regimes, measuring grounding accuracy via $\hat{y}$ within $B(y)$ and metrics such as $SR$ and $ASR$. The findings reveal that while models tolerate some natural noise, they are highly vulnerable to low-resolution inputs and carefully crafted perturbations, with untargeted attacks degrading embeddings and targeted attacks steering predictions toward a designated region. By establishing a rigorous benchmark and detailed empirical results, the paper provides actionable guidance for improving GUI-grounding robustness in practical deployments.
Abstract
Graphical User Interface (GUI) grounding models are crucial for enabling intelligent agents to understand and interact with complex visual interfaces. However, these models face significant robustness challenges in real-world scenarios due to natural noise and adversarial perturbations, and their robustness remains underexplored. In this study, we systematically evaluate the robustness of state-of-the-art GUI grounding models, such as UGround, under three conditions: natural noise, untargeted adversarial attacks, and targeted adversarial attacks. Our experiments, which were conducted across a wide range of GUI environments, including mobile, desktop, and web interfaces, have clearly demonstrated that GUI grounding models exhibit a high degree of sensitivity to adversarial perturbations and low-resolution conditions. These findings provide valuable insights into the vulnerabilities of GUI grounding models and establish a strong benchmark for future research aimed at enhancing their robustness in practical applications. Our code is available at https://github.com/ZZZhr-1/Robust_GUI_Grounding.
