Table of Contents
Fetching ...

Unsupervised Model Diagnosis

Yinong Oliver Wang, Eileen Li, Jinqi Luo, Zhaoning Wang, Fernando De la Torre

TL;DR

The proposed Unsupervised Model Diagnosis (UMO) leverages generative models to produce semantic counterfactual explanations without any user guidance, and optimizes for the most counterfactual directions in a generative latent space.

Abstract

Ensuring model explainability and robustness is essential for reliable deployment of deep vision systems. Current methods for evaluating robustness rely on collecting and annotating extensive test sets. While this is common practice, the process is labor-intensive and expensive with no guarantee of sufficient coverage across attributes of interest. Recently, model diagnosis frameworks have emerged leveraging user inputs (e.g., text) to assess the vulnerability of the model. However, such dependence on human can introduce bias and limitation given the domain knowledge of particular users. This paper proposes Unsupervised Model Diagnosis (UMO), that leverages generative models to produce semantic counterfactual explanations without any user guidance. Given a differentiable computer vision model (i.e., the target model), UMO optimizes for the most counterfactual directions in a generative latent space. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources, such as dictionaries or language models. We validate the framework on multiple vision tasks (e.g., classification, segmentation, keypoint detection). Extensive experiments show that our unsupervised discovery of semantic directions can correctly highlight spurious correlations and visualize the failure mode of target models without any human intervention.

Unsupervised Model Diagnosis

TL;DR

The proposed Unsupervised Model Diagnosis (UMO) leverages generative models to produce semantic counterfactual explanations without any user guidance, and optimizes for the most counterfactual directions in a generative latent space.

Abstract

Ensuring model explainability and robustness is essential for reliable deployment of deep vision systems. Current methods for evaluating robustness rely on collecting and annotating extensive test sets. While this is common practice, the process is labor-intensive and expensive with no guarantee of sufficient coverage across attributes of interest. Recently, model diagnosis frameworks have emerged leveraging user inputs (e.g., text) to assess the vulnerability of the model. However, such dependence on human can introduce bias and limitation given the domain knowledge of particular users. This paper proposes Unsupervised Model Diagnosis (UMO), that leverages generative models to produce semantic counterfactual explanations without any user guidance. Given a differentiable computer vision model (i.e., the target model), UMO optimizes for the most counterfactual directions in a generative latent space. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources, such as dictionaries or language models. We validate the framework on multiple vision tasks (e.g., classification, segmentation, keypoint detection). Extensive experiments show that our unsupervised discovery of semantic directions can correctly highlight spurious correlations and visualize the failure mode of target models without any human intervention.
Paper Structure (21 sections, 7 equations, 18 figures, 8 tables, 2 algorithms)

This paper contains 21 sections, 7 equations, 18 figures, 8 tables, 2 algorithms.

Figures (18)

  • Figure 1: Overview. Given a (a) computer vision model (e.g., classifier, key-point detector, segmentation model), how can we understand the model vulnerabilities without requiring user input nor test sets? Our proposed framework UMO leverages (b) foundation toolkits (e.g., large language models (LLMs) and multi-modal foundation models (MFMs)) to perform unsupervised model diagnosis. UMO not only outputs (c) counterfactual visual explanations but also (d) top-matched counterfactual attributes.
  • Figure 2: The UMO framework. Black solid lines denote forward passes; red dashed lines denote backpropagation; and purple dotted lines denote the inference of analysis. (a) We first optimize an edit direction $\Delta s$ in the latent space of generative models that yields counterfactual images of the target model. (b) After the optimization converges, we generate the original and edited images $x$ and $\hat{x}$ and map them to the CLIP embedding space with the L2C block (in \ref{['sec:counterfactual_analysis']}). (c) Then we analyze and report the diagnosis of counterfactual attributes by matching the image embedding differences $\hat{h}-h$ with attribute candidates.
  • Figure 3: Counterfactual pairs generated by different classifiers. We study three classifiers: a perceived gender classifier biased on "smiling" (left), an eyeglasses classifier biased on "lipstick" (middle), and a perceived age classifier biased on "bangs" (right). For each classifier model, we optimize the semantic latent edits to obtain counterfactual variations (bottom row) from the original generations (top row). This figure demonstrates the capability to provide visual counterfactual explanations on the biases of these classifiers.
  • Figure 4: Top-5 discovered attributes and their similarity scores, with the planted bias highlighted in orange. For a given target classifier, the similarity score of each attribute is computed through the counterfactual analysis module. These experiments indicate that our unsupervised diagnosis pipeline is indeed capable of discovering the bias in a given model.
  • Figure 5: Discovered attributes consistent across two backbones and one prior work against the same Cat/Dog classifier. Here we performed counterfactual analysis separately on the generated counterfactual pairs from StyleGAN and Diffusion Model. We also include an analysis based on ZOOM. Dark and light blue attributes are respectively consistent across all three diagnoses and the two backbones in our framework. We observe consistency in the discovered attributes despite the generative backbone and method differences.
  • ...and 13 more figures