Table of Contents
Fetching ...

Software as Content: Dynamic Applications as the Human-Agent Interaction Layer

Mulong Xie, Yang Xie

Abstract

Chat-based natural language interfaces have emerged as the dominant paradigm for human-agent interaction, yet they fundamentally constrain engagement with structured information and complex tasks. We identify three inherent limitations: the mismatch between structured data and linear text, the high entropy of unconstrained natural language input, and the lack of persistent, evolving interaction state. We introduce Software as Content (SaC), a paradigm in which dynamically generated agentic applications serve as the primary medium of human-agent interaction. Rather than communicating through sequential text exchange, this medium renders task-specific interfaces that present structured information and expose actionable affordances through which users iteratively guide agent behavior without relying solely on language. These interfaces persist and evolve across interaction cycles, transforming from transient responses into a shared, stateful interaction layer that progressively converges toward personalized, task-specific software. We formalize SaC through a human-agent-environment interaction model, derive design principles for generating and evolving agentic applications, and present a system architecture that operationalizes the paradigm. We evaluate across representative tasks of selection, exploration, and execution, demonstrating technical viability and expressive range, while identifying boundary conditions under which natural language remains preferable. By reframing interfaces as dynamically generated software artifacts, SaC opens a new design space for human-AI interaction, positioning dynamic software as a concrete and tractable research object.

Software as Content: Dynamic Applications as the Human-Agent Interaction Layer

Abstract

Chat-based natural language interfaces have emerged as the dominant paradigm for human-agent interaction, yet they fundamentally constrain engagement with structured information and complex tasks. We identify three inherent limitations: the mismatch between structured data and linear text, the high entropy of unconstrained natural language input, and the lack of persistent, evolving interaction state. We introduce Software as Content (SaC), a paradigm in which dynamically generated agentic applications serve as the primary medium of human-agent interaction. Rather than communicating through sequential text exchange, this medium renders task-specific interfaces that present structured information and expose actionable affordances through which users iteratively guide agent behavior without relying solely on language. These interfaces persist and evolve across interaction cycles, transforming from transient responses into a shared, stateful interaction layer that progressively converges toward personalized, task-specific software. We formalize SaC through a human-agent-environment interaction model, derive design principles for generating and evolving agentic applications, and present a system architecture that operationalizes the paradigm. We evaluate across representative tasks of selection, exploration, and execution, demonstrating technical viability and expressive range, while identifying boundary conditions under which natural language remains preferable. By reframing interfaces as dynamically generated software artifacts, SaC opens a new design space for human-AI interaction, positioning dynamic software as a concrete and tractable research object.
Paper Structure (66 sections, 3 equations, 9 figures, 2 tables)

This paper contains 66 sections, 3 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Positioning and comparison of existing interaction paradigms along two dimensions: information complexity---the degree to which a task requires structured, organized information architecture rather than linear presentation; personalization---the degree to which the interaction medium adapts to the specific user's task over time. Information browsing and textual chat occupy the low end of both axes. Generative UI components improve information complexity by producing structured output, but remain stateless and per-turn, offering no bidirectional interaction for human and agent. Traditional software achieves high information complexity through pre-built structured interfaces, but personalization is limited to fixed settings. Vibe coding achieves high personalization through artifact-oriented construction, but operates on empty containers disconnected from real task data, and requires explicit product intent from the user.
  • Figure 2: The SaC interaction cycle across four participants. Initialization (top): the user's initial query triggers App Generation, in which the agent reasons over the request, retrieves data from the environment, and constructs the initial application state $s_0$. Interaction cycle (center): the human observes $s_0$ and acts through one of two input channels---structured affordances $\Phi^s$ or the natural language channel $\Phi^{nl}$---whose intents are encoded and dispatched to the agent. The agent reasons, optionally retrieves data or executes actions against external systems, and renders an update $\Delta$ that drives the transition $s_{t+1} = s_t \oplus \Delta$. The resulting state $s_1$ is presented back to the human, closing the loop. Dashed borders denote optional operations. When the agent determines that a plain text response suffices (e.g., for simple factual queries or brief clarifications), the render step produces a text reply directly without instantiating or updating an agentic application; the cycle does not proceed.
  • Figure 3: The primary SaC application evolution illustrated through a Sydney relocation scenario (Reloco Sydney). Each panel shows the interaction trace (left, intent stack) alongside the current agentic application state (right). Cold start ($s_0$): the user's initial query is dispatched to the agent, which retrieves market and listing data and constructs an initial application comprising a property listing view, a relocation timeline, and contextually relevant affordances. First cycle ($s_0 \rightarrow s_1$): the user clicks the "Dog Parks" element ($\Phi^s$), dispatching an intent to the agent; the agent retrieves proximity data and updates the application by adding a Pet Parks tab and a Park Proximity Radar panel. Second cycle ($s_1 \rightarrow s_2$): the user submits a natural language query ($\Phi^{nl}$)---"I work from home 3 days a week, any setup recommendation?"---dispatching a new intent; the agent retrieves relevant information and extends the application with a WFH-focused acoustic setup section. Plain text reply: a subsequent factual query about local stores is answered directly by the agent without updating $s_2$, illustrating that the SaC cycle does not proceed when a text response suffices.
  • Figure 4: System architecture overview. The main pipeline (center) comprises three sequential stages: App Generation constructs the initial application state $s_0 = (V_0, \Phi_0, C_0)$ from a cold start; App Evolution computes successive state transitions $s_{t+1} = s_t \oplus \Delta$ in response to incoming events; Application Distribution and Reuse handles sharing and refresh of completed application states. Four shared modules (outer boxes) are invoked across stages: Intent Analysis provides modality assessment and data requirements at every stage; Environment Interaction handles data retrieval and system modification on behalf of both generation and evolution; View Render is the shared computational primitive underlying both stages, instantiating $V_0$ during generation and computing $\Delta$ during evolution, with view update strategy selection (element-level update, structural extension, or application replacement) determining the degree of structural change at each cycle; Quality Assurance enforces functional correctness and interaction quality within each pipeline stage.
  • Figure 5: Scenario 1: Car rental planning scenario. Cold start ($s_0$, left): initial application state generated from a query with three interacting constraints---provisional licence, dog, and one-way drop-off. Evolution ($s_0 \to s_1$, right): structural extension triggered by the anticipatory affordance Compare P-Plate Surcharge ($\Phi^{nl}$), appending a Young Driver Fees tab and a provider cost comparison table.
  • ...and 4 more figures