Building the Web for Agents: A Declarative Framework for Agent-Web Interaction
Sven Schultze, Meike Verena Kietzmann, Nils-Lucas Schönfeld, Ruth Stock-Homburg
TL;DR
Current AI agents on the web face brittle, unsafe, and privacy-sensitive interactions due to reliance on human-oriented UIs. VOIX introduces a web-native, declarative framework using <tool> and <context> to expose machine-readable capabilities, distributed across a Website, Browser Agent, and Inference Provider to preserve user privacy. A Chrome-based reference implementation and a three-day hackathon with 16 developers demonstrate VOIX’s learnability and expressive power, including high-level multimodal interactions and dynamic scoping. Latency benchmarks indicate VOIX offers faster, more reliable interactions than inference-based, vision-driven approaches, supporting a practical, decentralized Agentic Web with strong safety and controllability for developers and users.
Abstract
The increasing deployment of autonomous AI agents on the web is hampered by a fundamental misalignment: agents must infer affordances from human-oriented user interfaces, leading to brittle, inefficient, and insecure interactions. To address this, we introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents through simple, declarative HTML elements. VOIX introduces <tool> and <context> tags, allowing developers to explicitly define available actions and relevant state, thereby creating a clear, machine-readable contract for agent behavior. This approach shifts control to the website developer while preserving user privacy by disconnecting the conversational interactions from the website. We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers. The results demonstrate that participants, regardless of prior experience, were able to rapidly build diverse and functional agent-enabled web applications. Ultimately, this work provides a foundational mechanism for realizing the Agentic Web, enabling a future of seamless and secure human-AI collaboration on the web.
