Proofs as Explanations: Short Certificates for Reliable Predictions

Avrim Blum; Steve Hanneke; Chirag Pabbaraju; Donya Saless

Proofs as Explanations: Short Certificates for Reliable Predictions

Avrim Blum, Steve Hanneke, Chirag Pabbaraju, Donya Saless

TL;DR

The paper introduces robust certificates for explainable AI: short subsets of training data $S'$ that certify a test label $y$ for $x$ under realizability with up to $b$ corrupted points. It formalizes the robust hollow star number $s_b$ to capture worst-case certificate size, analyzes both worst-case and distribution-dependent bounds, and defines the certificate coefficient $\varepsilon_x$ to relate sample size to distributional proximity to the decision boundary. It proves that the minimum certificate size is tightly linked to $s_b$ (with recent lower bounds for halfspaces) and provides a sharp $\varepsilon_x$-dependent sample complexity, $m=O\big( (b + d\log(1/\varepsilon_x) + \log(1/\delta))/\varepsilon_x \big)$, along with approaches based on reweighting to achieve shorter certificates. The work connects robust certification with classical notions like hollow star numbers and tolerance Carath\u00e9odory, offering both theoretical limits and practical avenues for producing concise proofs of label correctness under noise, with implications for reliable, explainable predictions.

Abstract

We consider a model for explainable AI in which an explanation for a prediction $h(x)=y$ consists of a subset $S'$ of the training data (if it exists) such that all classifiers $h' \in H$ that make at most $b$ mistakes on $S'$ predict $h'(x)=y$. Such a set $S'$ serves as a proof that $x$ indeed has label $y$ under the assumption that (1) the target function $h^\star$ belongs to $H$, and (2) the set $S$ contains at most $b$ corrupted points. For example, if $b=0$ and $H$ is the family of linear classifiers in $\mathbb{R}^d$, and if $x$ lies inside the convex hull of the positive data points in $S$ (and hence every consistent linear classifier labels $x$ as positive), then Carathéodory's theorem states that $x$ lies inside the convex hull of $d+1$ of those points. So, a set $S'$ of size $d+1$ could be released as an explanation for a positive prediction, and would serve as a short proof of correctness of the prediction under the assumption of realizability. In this work, we consider this problem more generally, for general hypothesis classes $H$ and general values $b\geq 0$. We define the notion of the robust hollow star number of $H$ (which generalizes the standard hollow star number), and show that it precisely characterizes the worst-case size of the smallest certificate achievable, and analyze its size for natural classes. We also consider worst-case distributional bounds on certificate size, as well as distribution-dependent bounds that we show tightly control the sample size needed to get a certificate for any given test example. In particular, we define a notion of the certificate coefficient $\varepsilon_x$ of an example $x$ with respect to a data distribution $D$ and target function $h^\star$, and prove matching upper and lower bounds on sample size as a function of $\varepsilon_x$, $b$, and the VC dimension $d$ of $H$.

Proofs as Explanations: Short Certificates for Reliable Predictions

TL;DR

The paper introduces robust certificates for explainable AI: short subsets of training data

that certify a test label

for

under realizability with up to

corrupted points. It formalizes the robust hollow star number

to capture worst-case certificate size, analyzes both worst-case and distribution-dependent bounds, and defines the certificate coefficient

to relate sample size to distributional proximity to the decision boundary. It proves that the minimum certificate size is tightly linked to

(with recent lower bounds for halfspaces) and provides a sharp

-dependent sample complexity,

, along with approaches based on reweighting to achieve shorter certificates. The work connects robust certification with classical notions like hollow star numbers and tolerance Carath\u00e9odory, offering both theoretical limits and practical avenues for producing concise proofs of label correctness under noise, with implications for reliable, explainable predictions.

Abstract

We consider a model for explainable AI in which an explanation for a prediction

consists of a subset

of the training data (if it exists) such that all classifiers

that make at most

mistakes on

predict

. Such a set

serves as a proof that

indeed has label

under the assumption that (1) the target function

belongs to

, and (2) the set

contains at most

corrupted points. For example, if

and

is the family of linear classifiers in

, and if

lies inside the convex hull of the positive data points in

(and hence every consistent linear classifier labels

as positive), then Carathéodory's theorem states that

lies inside the convex hull of

of those points. So, a set

of size

could be released as an explanation for a positive prediction, and would serve as a short proof of correctness of the prediction under the assumption of realizability. In this work, we consider this problem more generally, for general hypothesis classes

and general values

. We define the notion of the robust hollow star number of

(which generalizes the standard hollow star number), and show that it precisely characterizes the worst-case size of the smallest certificate achievable, and analyze its size for natural classes. We also consider worst-case distributional bounds on certificate size, as well as distribution-dependent bounds that we show tightly control the sample size needed to get a certificate for any given test example. In particular, we define a notion of the certificate coefficient

of an example

with respect to a data distribution

and target function

, and prove matching upper and lower bounds on sample size as a function of

, and the VC dimension

Proofs as Explanations: Short Certificates for Reliable Predictions

TL;DR

Abstract

Proofs as Explanations: Short Certificates for Reliable Predictions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (27)