Table of Contents
Fetching ...

MARIA: A Framework for Marginal Risk Assessment without Ground Truth in AI Systems

Jieshan Chen, Suyu Ma, Qinghua Lu, Sung Une Lee, Liming Zhu

TL;DR

MARIA introduces a ground-truth–free framework for evaluating AI adoption by measuring marginal risk, defined as $MR = \Delta R = R_{new} - R_{baseline}$, where $R$ is a multi-dimensional risk vector. It shifts focus from absolute performance to relative differences across three evaluation dimensions: Predictability, Capability, and Interaction Dominance, enabling scalable, automated, and assumption-aware assessment. The framework is demonstrated in a document-evaluation case study, showing that AI can reduce certain risks (e.g., inconsistency) while introducing others (e.g., prompt-injection vulnerabilities and fairness shifts), thus guiding responsible adoption. By pairing a structured methodology with game-based and proxy-metric evaluations, MARIA provides practical guidance for deployment and ongoing monitoring without relying on inaccessible ground-truth data.

Abstract

Before deploying an AI system to replace an existing process, it must be compared with the incumbent to ensure improvement without added risk. Traditional evaluation relies on ground truth for both systems, but this is often unavailable due to delayed or unknowable outcomes, high costs, or incomplete data, especially for long-standing systems deemed safe by convention. The more practical solution is not to compute absolute risk but the difference between systems. We therefore propose a marginal risk assessment framework, that avoids dependence on ground truth or absolute risk. It emphasizes three kinds of relative evaluation methodology, including predictability, capability and interaction dominance. By shifting focus from absolute to relative evaluation, our approach equips software teams with actionable guidance: identifying where AI enhances outcomes, where it introduces new risks, and how to adopt such systems responsibly.

MARIA: A Framework for Marginal Risk Assessment without Ground Truth in AI Systems

TL;DR

MARIA introduces a ground-truth–free framework for evaluating AI adoption by measuring marginal risk, defined as , where is a multi-dimensional risk vector. It shifts focus from absolute performance to relative differences across three evaluation dimensions: Predictability, Capability, and Interaction Dominance, enabling scalable, automated, and assumption-aware assessment. The framework is demonstrated in a document-evaluation case study, showing that AI can reduce certain risks (e.g., inconsistency) while introducing others (e.g., prompt-injection vulnerabilities and fairness shifts), thus guiding responsible adoption. By pairing a structured methodology with game-based and proxy-metric evaluations, MARIA provides practical guidance for deployment and ongoing monitoring without relying on inaccessible ground-truth data.

Abstract

Before deploying an AI system to replace an existing process, it must be compared with the incumbent to ensure improvement without added risk. Traditional evaluation relies on ground truth for both systems, but this is often unavailable due to delayed or unknowable outcomes, high costs, or incomplete data, especially for long-standing systems deemed safe by convention. The more practical solution is not to compute absolute risk but the difference between systems. We therefore propose a marginal risk assessment framework, that avoids dependence on ground truth or absolute risk. It emphasizes three kinds of relative evaluation methodology, including predictability, capability and interaction dominance. By shifting focus from absolute to relative evaluation, our approach equips software teams with actionable guidance: identifying where AI enhances outcomes, where it introduces new risks, and how to adopt such systems responsibly.

Paper Structure

This paper contains 20 sections, 1 equation, 1 figure.

Figures (1)

  • Figure 1: MARIA workflow.