Table of Contents
Fetching ...

Classification with Conceptual Safeguards

Hailey Joren, Charles Marx, Berk Ustun

TL;DR

A new approach to promote safety in classification tasks with established concepts by developing methods to propagate the uncertainty in concept predictions and to flag salient concepts for human review, which can improve performance and coverage in deep learning tasks.

Abstract

We propose a new approach to promote safety in classification tasks with established concepts. Our approach -- called a conceptual safeguard -- acts as a verification layer for models that predict a target outcome by first predicting the presence of intermediate concepts. Given this architecture, a safeguard ensures that a model meets a minimal level of accuracy by abstaining from uncertain predictions. In contrast to a standard selective classifier, a safeguard provides an avenue to improve coverage by allowing a human to confirm the presence of uncertain concepts on instances on which it abstains. We develop methods to build safeguards that maximize coverage without compromising safety, namely techniques to propagate the uncertainty in concept predictions and to flag salient concepts for human review. We benchmark our approach on a collection of real-world and synthetic datasets, showing that it can improve performance and coverage in deep learning tasks.

Classification with Conceptual Safeguards

TL;DR

A new approach to promote safety in classification tasks with established concepts by developing methods to propagate the uncertainty in concept predictions and to flag salient concepts for human review, which can improve performance and coverage in deep learning tasks.

Abstract

We propose a new approach to promote safety in classification tasks with established concepts. Our approach -- called a conceptual safeguard -- acts as a verification layer for models that predict a target outcome by first predicting the presence of intermediate concepts. Given this architecture, a safeguard ensures that a model meets a minimal level of accuracy by abstaining from uncertain predictions. In contrast to a standard selective classifier, a safeguard provides an avenue to improve coverage by allowing a human to confirm the presence of uncertain concepts on instances on which it abstains. We develop methods to build safeguards that maximize coverage without compromising safety, namely techniques to propagate the uncertainty in concept predictions and to flag salient concepts for human review. We benchmark our approach on a collection of real-world and synthetic datasets, showing that it can improve performance and coverage in deep learning tasks.

Paper Structure

This paper contains 28 sections, 2 theorems, 10 equations, 2 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that $\overline{y}$ is a calibrated probability prediction for $y$. Then any selective classifier $\varphi_\tau(\overline{y})$ that abstains when $\overline{y}$ has confidence below $1 -\tau$ achieves accuracy at least $1 - \tau$.

Figures (2)

  • Figure 1: Conceptual safeguard to detect melanoma from an image of a skin lesion. We consider a model that estimates the probabilities of $m$ concepts: Dotted, Pigmented … IrregularVasc. Given these probabilities, a conceptual safeguard will decide whether to output a prediction $\hat{y} \in \{{\scriptsize\texttt{Melanoma}}, {\scriptsize\texttt{NoMelanoma}}\}$ or to abstain $\hat{y} = \perp$. The safeguard improves accuracy by abstaining on images that would receive a low confidence prediction, and measures confidence in a way that accounts for the uncertainty in concept predictions through uncertainty propagation. On the left, we show an image where a safeguard abstains because its confidence fails to meet the threshold to ensure high accuracy $\textrm{Pr}({\scriptsize\texttt{Melanoma}}) = 62\% \leq 90\%$. On the right, we show a human expert can resolve the abstention by confirming the presence of concepts ${\scriptsize\texttt{Dotted}}$ and ${\scriptsize\texttt{IrregVasc}}$ in the image.
  • Figure 2: We evaluate the performance of classification models that can abstain using an accuracy-coverage curve franc2023optimal. Given a model that outputs a probability prediction for each point, a conceptual safeguard flags points on which a model abstains based on a confidence threshold $\tau{} \in [0, 0.5]$ -- where setting $\tau$ = 0 leads to 100% coverage and setting $\tau = 0.5$ leads to 0% coverage.

Theorems & Definitions (4)

  • Definition 1
  • Proposition 1
  • Proposition 1
  • proof