Table of Contents
Fetching ...

Finding Patterns in Ambiguity: Interpretable Stress Testing in the Decision~Boundary

Inês Gomes, Luís F. Teixeira, Jan N. van Rijn, Carlos Soares, André Restivo, Luís Cunha, Moisés Santos

TL;DR

This work targets interpretability for deep binary classifiers by examining decision boundaries through border-focused samples. It combines GASTeN-generated borderline data with deep clustering (UMAP) and Gaussian Mixture Models to identify distinct patterns, then selects cluster medoids as prototypes and explains them with GradientSHAP. The approach is evaluated on binary MNIST and Fashion-MNIST tasks, revealing diverse, interpretable prototypes that reflect low-confidence decisions and offering a tool for model auditing and improvement. Overall, the method demonstrates potential for surfacing actionable insights about where and why models struggle, supporting responsible AI deployment and documentation via model cards and targeted data augmentation.

Abstract

The increasing use of deep learning across various domains highlights the importance of understanding the decision-making processes of these black-box models. Recent research focusing on the decision boundaries of deep classifiers, relies on generated synthetic instances in areas of low confidence, uncovering samples that challenge both models and humans. We propose a novel approach to enhance the interpretability of deep binary classifiers by selecting representative samples from the decision boundary - prototypes - and applying post-model explanation algorithms. We evaluate the effectiveness of our approach through 2D visualizations and GradientSHAP analysis. Our experiments demonstrate the potential of the proposed method, revealing distinct and compact clusters and diverse prototypes that capture essential features that lead to low-confidence decisions. By offering a more aggregated view of deep classifiers' decision boundaries, our work contributes to the responsible development and deployment of reliable machine learning systems.

Finding Patterns in Ambiguity: Interpretable Stress Testing in the Decision~Boundary

TL;DR

This work targets interpretability for deep binary classifiers by examining decision boundaries through border-focused samples. It combines GASTeN-generated borderline data with deep clustering (UMAP) and Gaussian Mixture Models to identify distinct patterns, then selects cluster medoids as prototypes and explains them with GradientSHAP. The approach is evaluated on binary MNIST and Fashion-MNIST tasks, revealing diverse, interpretable prototypes that reflect low-confidence decisions and offering a tool for model auditing and improvement. Overall, the method demonstrates potential for surfacing actionable insights about where and why models struggle, supporting responsible AI deployment and documentation via model cards and targeted data augmentation.

Abstract

The increasing use of deep learning across various domains highlights the importance of understanding the decision-making processes of these black-box models. Recent research focusing on the decision boundaries of deep classifiers, relies on generated synthetic instances in areas of low confidence, uncovering samples that challenge both models and humans. We propose a novel approach to enhance the interpretability of deep binary classifiers by selecting representative samples from the decision boundary - prototypes - and applying post-model explanation algorithms. We evaluate the effectiveness of our approach through 2D visualizations and GradientSHAP analysis. Our experiments demonstrate the potential of the proposed method, revealing distinct and compact clusters and diverse prototypes that capture essential features that lead to low-confidence decisions. By offering a more aggregated view of deep classifiers' decision boundaries, our work contributes to the responsible development and deployment of reliable machine learning systems.
Paper Structure (14 sections, 3 figures)

This paper contains 14 sections, 3 figures.

Figures (3)

  • Figure 1: Schematic overview of the proposed method for improving the decision boundary interpretability of the Model Under Test by combining synthetic image generation and deep clustering.
  • Figure 2: UMAP 2D space for MNIST 7 vs 1 and four-filter CNN. The test set is marked by stars with 1 in red, and 7 in green. Black pins indicate the prototypes or baseline positions.
  • Figure 3: Selected images and the corresponding GradientSHAP maps for MNIST 7 vs 1 and four-filter CNN. Features contributing to the classification of 1 are red, and 7 are green.