Table of Contents
Fetching ...

PXGen: A Post-hoc Explainable Method for Generative Models

Yen-Lung Huang, Ming-Hsi Weng, Hao-Tsung Yang

TL;DR

PXGen addresses the explainability gap for generative models by introducing a post-hoc framework that treats the model as a black box and explains its behavior using an Anchor set and intrinsic/extrinsic criteria. It computes feature-values per anchor, clusters anchors with statistics, and selects representative exemplars via $k$-dispersion or $k$-center, all without access to training-time data. The approach is demonstrated on a classical VAE and Soft-IntroVAE to reveal phenomena such as model delusion and aligned conception, with computational cost of $O(n^2)$ in the anchor set size. The work comprises a three-phase framework, a multi-criteria, customizable explanation process, and visualization of characteristic anchors, offering broad applicability to encoder–decoder generative models and potential impact on training, safety, and policy.

Abstract

With the rapid growth of generative AI in numerous applications, explainable AI (XAI) plays a crucial role in ensuring the responsible development and deployment of generative AI technologies. XAI has undergone notable advancements and widespread adoption in recent years, reflecting a concerted push to enhance the transparency, interpretability, and credibility of AI systems. Recent research emphasizes that a proficient XAI method should adhere to a set of criteria, primarily focusing on two key areas. Firstly, it should ensure the quality and fluidity of explanations, encompassing aspects like faithfulness, plausibility, completeness, and tailoring to individual needs. Secondly, the design principle of the XAI system or mechanism should cover the following factors such as reliability, resilience, the verifiability of its outputs, and the transparency of its algorithm. However, research in XAI for generative models remains relatively scarce, with little exploration into how such methods can effectively meet these criteria in that domain. In this work, we propose PXGen, a post-hoc explainable method for generative models. Given a model that needs to be explained, PXGen prepares two materials for the explanation, the Anchor set and intrinsic & extrinsic criteria. Those materials are customizable by users according to their purpose and requirements. Via the calculation of each criterion, each anchor has a set of feature values and PXGen provides examplebased explanation methods according to the feature values among all the anchors and illustrated and visualized to the users via tractable algorithms such as k-dispersion or k-center.

PXGen: A Post-hoc Explainable Method for Generative Models

TL;DR

PXGen addresses the explainability gap for generative models by introducing a post-hoc framework that treats the model as a black box and explains its behavior using an Anchor set and intrinsic/extrinsic criteria. It computes feature-values per anchor, clusters anchors with statistics, and selects representative exemplars via -dispersion or -center, all without access to training-time data. The approach is demonstrated on a classical VAE and Soft-IntroVAE to reveal phenomena such as model delusion and aligned conception, with computational cost of in the anchor set size. The work comprises a three-phase framework, a multi-criteria, customizable explanation process, and visualization of characteristic anchors, offering broad applicability to encoder–decoder generative models and potential impact on training, safety, and policy.

Abstract

With the rapid growth of generative AI in numerous applications, explainable AI (XAI) plays a crucial role in ensuring the responsible development and deployment of generative AI technologies. XAI has undergone notable advancements and widespread adoption in recent years, reflecting a concerted push to enhance the transparency, interpretability, and credibility of AI systems. Recent research emphasizes that a proficient XAI method should adhere to a set of criteria, primarily focusing on two key areas. Firstly, it should ensure the quality and fluidity of explanations, encompassing aspects like faithfulness, plausibility, completeness, and tailoring to individual needs. Secondly, the design principle of the XAI system or mechanism should cover the following factors such as reliability, resilience, the verifiability of its outputs, and the transparency of its algorithm. However, research in XAI for generative models remains relatively scarce, with little exploration into how such methods can effectively meet these criteria in that domain. In this work, we propose PXGen, a post-hoc explainable method for generative models. Given a model that needs to be explained, PXGen prepares two materials for the explanation, the Anchor set and intrinsic & extrinsic criteria. Those materials are customizable by users according to their purpose and requirements. Via the calculation of each criterion, each anchor has a set of feature values and PXGen provides examplebased explanation methods according to the feature values among all the anchors and illustrated and visualized to the users via tractable algorithms such as k-dispersion or k-center.
Paper Structure (18 sections, 8 figures)

This paper contains 18 sections, 8 figures.

Figures (8)

  • Figure 1: The generative model is trained by pictures of the handwriting digit $0$ in MNIST. The two methods (PXGen, VAE-TracIn) are used to find the top-6 agreeable samples, i.e., representative samples of the model.
  • Figure 2: The flowchart of PXGen. In the preparation phase, three items, model, criteria, and anchor set are prepared. In the analysis phase, statistical analysis and clas- sification among the anchors are examined to obtain groups with distinct characteristics. In the discovery phase, we identify anchors within specific groups that can provide key information to achieve the explanation.
  • Figure 3: In HILE, the phenomenon of "model delusion" is displayed, where those anchors are incorrectly decoded. (Top : Original data ; Bottom : Reconstructive data)
  • Figure 4: In LIHE, a phenomenon of "aligned conception" between the handwriting digit “0” and “1” is observed, indicating that there is some agreement of the concept in these two types of samples. (Top : Original data ; Bottom : Reconstructive data)
  • Figure 5: Utilizing two algorithms to identify representative anchors within HIHE.
  • ...and 3 more figures