Explainable Metric Learning for Deflating Data Bias

Emma Andrews; Prabhat Mishra

Explainable Metric Learning for Deflating Data Bias

Emma Andrews, Prabhat Mishra

TL;DR

This work addresses the lack of interpretable similarity in deep metric learning by introducing Hierarchical Explainable Metric Learning (HEML), which decomposes images into semantic segments and learns segment-specific metrics in a bottom-up fashion. A metric tree then aggregates these local decisions into a global similarity, with the primary explainability metric $d_{SNR}$ guiding interpretations of segment contributions. Key contributions include the segmentation-based, memory-efficient framework, a lightweight alternative to saliency-driven approaches, and empirical demonstrations on CelebA, Human Parsing, and SceneParse150 showing competitive accuracy with substantially reduced GPU memory. The approach enables bias reduction by generating segment-informed samples and provides intrinsic explainability through hierarchical, segment-level decisions, suitable for developers, search engines, and AI systems seeking interpretable context.

Abstract

Image classification is an essential part of computer vision which assigns a given input image to a specific category based on the similarity evaluation within given criteria. While promising classifiers can be obtained through deep learning models, these approaches lack explainability, where the classification results are hard to interpret in a human-understandable way. In this paper, we present an explainable metric learning framework, which constructs hierarchical levels of semantic segments of an image for better interpretability. The key methodology involves a bottom-up learning strategy, starting by training the local metric learning model for the individual segments and then combining segments to compose comprehensive metrics in a tree. Specifically, our approach enables a more human-understandable similarity measurement between two images based on the semantic segments within it, which can be utilized to generate new samples to reduce bias in a training dataset. Extensive experimental evaluation demonstrates that the proposed approach can drastically improve model accuracy compared with state-of-the-art methods.

Explainable Metric Learning for Deflating Data Bias

TL;DR

guiding interpretations of segment contributions. Key contributions include the segmentation-based, memory-efficient framework, a lightweight alternative to saliency-driven approaches, and empirical demonstrations on CelebA, Human Parsing, and SceneParse150 showing competitive accuracy with substantially reduced GPU memory. The approach enables bias reduction by generating segment-informed samples and provides intrinsic explainability through hierarchical, segment-level decisions, suitable for developers, search engines, and AI systems seeking interpretable context.

Abstract

Paper Structure (14 sections, 3 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Background and Related Work
Explainabile Artificial Intelligence
Explainable Deep Metric Learning
Hierarchical Explainable Metric Learning
Semantic Segmentation
Bottom-Up Metric Learning
Metric Tree Construction
Experiments
Experimental Setup
Case Study 1: CelebA
Case Study 2: Human Parsing
Case Study 3: SceneParse150
Conclusion

Figures (5)

Figure 1: An overview of our proposed framework. A model is trained at each individual and combined segment until the fully reconstructed image is reached. Example segments are from CelebA liu2015deep.
Figure 2: Overview of our proposed hierarchical explainable metric learning framework. It consists of three major tasks: semantic segmentation, bottom-up metric learning, and metric tree construction.
Figure 3: Left image: Original sample image from CelebA. Remaining images: Example segments combined with background. From left to right: hair, neck, nose, and skin segments.
Figure 4: Hierarchical visualization of two sample comparisons from the CelebA dataset. The value above each image comparison indicates the SNR distance between the two images. A value closer to 0 indicates the images are similar, and a value closer to 1 indicates the images are dissimilar. The distances are measured via an inference model using the trained Triplet-HEML model for CelebA. Segments from left to right on the bottom row: neck, cloth, hair, hat; eyes, brows; ears, ear_r, eye_g; nose, skin, lips.
Figure 5: Individual segments and SNR distance between, as indicated by value above each image. From left to right: neck, r_brow, skin, l_eye, r_eye, l_lip, cloth, l_ear, r_ear, hair, nose, l_brow, u_lip.

Explainable Metric Learning for Deflating Data Bias

TL;DR

Abstract

Explainable Metric Learning for Deflating Data Bias

Authors

TL;DR

Abstract

Table of Contents

Figures (5)