EmbeWebAgent: Embedding Web Agents into Any Customized UI
Chenyang Ma, Clyde Fare, Matthew Wilson, Dave Braines
TL;DR
EmbeWebAgent addresses the brittleness of interface-level web agents by embedding reasoning-enabled agents directly into real UIs using ARIA-based observations and a per-page function registry. The system combines a lightweight frontend shim with a stack-agnostic backend that orchestrates a web-interaction agent, an analysis agent leveraging MCP tools, and a chat agent, employing a ReAct-style loop and Chain-of-Thought prompts to execute complex, multi-step workflows. Key contributions include explicit navigation, mixed-granularity actions, session grounding, and an end-to-end demonstration in a chemistry UI with minimal retrofit, illustrating robust end-to-end behavior in live interfaces. The approach promises enhanced robustness and action expressiveness for enterprise UIs and lays groundwork for standardized ARIA-based observation pipelines and portable UI-agent protocols across heterogeneous frontends.
Abstract
Most web agents operate at the human interface level, observing screenshots or raw DOM trees without application-level access, which limits robustness and action expressiveness. In enterprise settings, however, explicit control of both the frontend and backend is available. We present EmbeWebAgent, a framework for embedding agents directly into existing UIs using lightweight frontend hooks (curated ARIA and URL-based observations, and a per-page function registry exposed via a WebSocket) and a reusable backend workflow that performs reasoning and takes actions. EmbeWebAgent is stack-agnostic (e.g., React or Angular), supports mixed-granularity actions ranging from GUI primitives to higher-level composites, and orchestrates navigation, manipulation, and domain-specific analytics via MCP tools. Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting. Live Demo: https://youtu.be/Cy06Ljee1JQ
