Table of Contents
Fetching ...

EmbeWebAgent: Embedding Web Agents into Any Customized UI

Chenyang Ma, Clyde Fare, Matthew Wilson, Dave Braines

TL;DR

EmbeWebAgent addresses the brittleness of interface-level web agents by embedding reasoning-enabled agents directly into real UIs using ARIA-based observations and a per-page function registry. The system combines a lightweight frontend shim with a stack-agnostic backend that orchestrates a web-interaction agent, an analysis agent leveraging MCP tools, and a chat agent, employing a ReAct-style loop and Chain-of-Thought prompts to execute complex, multi-step workflows. Key contributions include explicit navigation, mixed-granularity actions, session grounding, and an end-to-end demonstration in a chemistry UI with minimal retrofit, illustrating robust end-to-end behavior in live interfaces. The approach promises enhanced robustness and action expressiveness for enterprise UIs and lays groundwork for standardized ARIA-based observation pipelines and portable UI-agent protocols across heterogeneous frontends.

Abstract

Most web agents operate at the human interface level, observing screenshots or raw DOM trees without application-level access, which limits robustness and action expressiveness. In enterprise settings, however, explicit control of both the frontend and backend is available. We present EmbeWebAgent, a framework for embedding agents directly into existing UIs using lightweight frontend hooks (curated ARIA and URL-based observations, and a per-page function registry exposed via a WebSocket) and a reusable backend workflow that performs reasoning and takes actions. EmbeWebAgent is stack-agnostic (e.g., React or Angular), supports mixed-granularity actions ranging from GUI primitives to higher-level composites, and orchestrates navigation, manipulation, and domain-specific analytics via MCP tools. Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting. Live Demo: https://youtu.be/Cy06Ljee1JQ

EmbeWebAgent: Embedding Web Agents into Any Customized UI

TL;DR

EmbeWebAgent addresses the brittleness of interface-level web agents by embedding reasoning-enabled agents directly into real UIs using ARIA-based observations and a per-page function registry. The system combines a lightweight frontend shim with a stack-agnostic backend that orchestrates a web-interaction agent, an analysis agent leveraging MCP tools, and a chat agent, employing a ReAct-style loop and Chain-of-Thought prompts to execute complex, multi-step workflows. Key contributions include explicit navigation, mixed-granularity actions, session grounding, and an end-to-end demonstration in a chemistry UI with minimal retrofit, illustrating robust end-to-end behavior in live interfaces. The approach promises enhanced robustness and action expressiveness for enterprise UIs and lays groundwork for standardized ARIA-based observation pipelines and portable UI-agent protocols across heterogeneous frontends.

Abstract

Most web agents operate at the human interface level, observing screenshots or raw DOM trees without application-level access, which limits robustness and action expressiveness. In enterprise settings, however, explicit control of both the frontend and backend is available. We present EmbeWebAgent, a framework for embedding agents directly into existing UIs using lightweight frontend hooks (curated ARIA and URL-based observations, and a per-page function registry exposed via a WebSocket) and a reusable backend workflow that performs reasoning and takes actions. EmbeWebAgent is stack-agnostic (e.g., React or Angular), supports mixed-granularity actions ranging from GUI primitives to higher-level composites, and orchestrates navigation, manipulation, and domain-specific analytics via MCP tools. Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting. Live Demo: https://youtu.be/Cy06Ljee1JQ
Paper Structure (12 sections, 4 figures)

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: (a) Interface-level agent (screenshots/DOM trees, simulated clicks) vs. (b) Embedded agent (EmbeWebAgent) with ARIA labels as observations and explicit UI actions.
  • Figure 2: EmbeWebAgent pipeline. The frontend shim exposes curated ARIA observations and a per-page function registry via a WebSocket. The backend maintains session state, filters actions according to the current page, and coordinates a multi-agent workflow to infer navigation and manipulation actions, as well as invoke domain tools (via MCP or web APIs).
  • Figure 3: One page of the chemistry analysis UI used in our demo.
  • Figure 4: Testing interface. Integration tests evaluate action correctness and latency under simulated frontend-backend interaction.