Validity Is What You Need

Sebastian Benthall; Andrew Clark

Validity Is What You Need

Sebastian Benthall, Andrew Clark

TL;DR

The paper reframes Agentic AI as a SaaS-like service deployed in complex enterprise settings, arguing that practical validity hinges on application-level validation rather than solely on foundation-model capabilities, and offers a realist definition and design framework. It introduces a multi-stage design and governance process grounded in mechanism design, addressing three information-theoretic validation challenges: deployment-specific information gaps, designer knowledge gaps, and stakeholder confidence, with an emphasis on end-to-end validation and guardrails. The authors contend that strong validation can reduce reliance on large foundation models by favoring smaller, interpretable components or expert systems, formalized by goal-directed decision making akin to the Bellman framework $V(x) = \max_{a \in \Gamma(x)} \{ F(x,a) + \beta V(T(x,a)) \}$. Practically, the paper outlines concrete steps for enterprise validation—from modeling the sociotechnical context to continuous monitoring—arguing that application evaluations, not just model capabilities, will determine the maturity and value of Agentic AI in real-world use cases.

Abstract

While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist definition. Agentic AI is a software delivery mechanism, comparable to software as a service (SaaS), which puts an application to work autonomously in a complex enterprise setting. Recent advances in large language models (LLMs) as foundation models have driven excitement in Agentic AI. We note, however, that Agentic AI systems are primarily applications, not foundations, and so their success depends on validation by end users and principal stakeholders. The tools and techniques needed by the principal users to validate their applications are quite different from the tools and techniques used to evaluate foundation models. Ironically, with good validation measures in place, in many cases the foundation models can be replaced with much simpler, faster, and more interpretable models that handle core logic. When it comes to Agentic AI, validity is what you need. LLMs are one option that might achieve it.

Validity Is What You Need

TL;DR

Abstract

Validity Is What You Need

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)