Affordance Representation and Recognition for Autonomous Agents
Habtom Kahsay Gidey, Niklas Huber, Alexander Lenz, Alois Knoll
TL;DR
This work tackles how autonomous agents can build actionable world models from structured data by addressing DOM verbosity and brittle service integrations. It introduces two architectural patterns—the DOM Transduction Pattern for distilling complex webpages into a compact Page Affordance Model and the Hypermedia Affordances Recognition Pattern for runtime discovery via WoT Thing Descriptions—that jointly enable scalable, adaptive perception and interoperation. Together, they drive the construction of a Cognitive Map that unifies structured page data and dynamic service capabilities, enabling more efficient, resilient, and predictive automation on the web. The paper lays a principled pattern-language foundation with concrete design constraints and outlines a path toward future multimodal perception patterns.
Abstract
The autonomy of software agents is fundamentally dependent on their ability to construct an actionable internal world model from the structured data that defines their digital environment, such as the Document Object Model (DOM) of web pages and the semantic descriptions of web services. However, constructing this world model from raw structured data presents two critical challenges: the verbosity of raw HTML makes it computationally intractable for direct use by foundation models, while the static nature of hardcoded API integrations prevents agents from adapting to evolving services. This paper introduces a pattern language for world modeling from structured data, presenting two complementary architectural patterns. The DOM Transduction Pattern addresses the challenge of web page complexity by distilling} a verbose, raw DOM into a compact, task-relevant representation or world model optimized for an agent's reasoning core. Concurrently, the Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model by parsing standardized semantic descriptions to discover and integrate the capabilities of unknown web services at runtime. Together, these patterns provide a robust framework for engineering agents that can efficiently construct and maintain an accurate world model, enabling scalable, adaptive, and interoperable automation across the web and its extended resources.
