Table of Contents
Fetching ...

Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM

Ioannis E. Livieris, Emmanuel Pintelas, Niki Kiriakidou, Panagiotis Pintelas

TL;DR

This work defines explainable image similarity and presents a framework that couples Siamese networks with Grad-CAM to deliver both a similarity score and visual explanations for image pairs. It produces factual explanations via heatmaps identifying regions driving the decision, and counterfactual explanations by swapping the decision label to show alternative influencing regions. The method is demonstrated on Flowers, Skin cancer, and AirBnB datasets, with additional analysis showing how simple data-preprocessing (e.g., cropping blossoms) can modestly boost performance. The approach aims to increase interpretability, trust, and practical usefulness in image similarity tasks, and the authors provide public code to facilitate adoption.

Abstract

With the proliferation of image-based applications in various domains, the need for accurate and interpretable image similarity measures has become increasingly critical. Existing image similarity models often lack transparency, making it challenging to understand the reasons why two images are considered similar. In this paper, we propose the concept of explainable image similarity, where the goal is the development of an approach, which is capable of providing similarity scores along with visual factual and counterfactual explanations. Along this line, we present a new framework, which integrates Siamese Networks and Grad-CAM for providing explainable image similarity and discuss the potential benefits and challenges of adopting this approach. In addition, we provide a comprehensive discussion about factual and counterfactual explanations provided by the proposed framework for assisting decision making. The proposed approach has the potential to enhance the interpretability, trustworthiness and user acceptance of image-based systems in real-world image similarity applications. The implementation code can be found in https://github.com/ioannislivieris/Grad_CAM_Siamese.git.

Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM

TL;DR

This work defines explainable image similarity and presents a framework that couples Siamese networks with Grad-CAM to deliver both a similarity score and visual explanations for image pairs. It produces factual explanations via heatmaps identifying regions driving the decision, and counterfactual explanations by swapping the decision label to show alternative influencing regions. The method is demonstrated on Flowers, Skin cancer, and AirBnB datasets, with additional analysis showing how simple data-preprocessing (e.g., cropping blossoms) can modestly boost performance. The approach aims to increase interpretability, trust, and practical usefulness in image similarity tasks, and the authors provide public code to facilitate adoption.

Abstract

With the proliferation of image-based applications in various domains, the need for accurate and interpretable image similarity measures has become increasingly critical. Existing image similarity models often lack transparency, making it challenging to understand the reasons why two images are considered similar. In this paper, we propose the concept of explainable image similarity, where the goal is the development of an approach, which is capable of providing similarity scores along with visual factual and counterfactual explanations. Along this line, we present a new framework, which integrates Siamese Networks and Grad-CAM for providing explainable image similarity and discuss the potential benefits and challenges of adopting this approach. In addition, we provide a comprehensive discussion about factual and counterfactual explanations provided by the proposed framework for assisting decision making. The proposed approach has the potential to enhance the interpretability, trustworthiness and user acceptance of image-based systems in real-world image similarity applications. The implementation code can be found in https://github.com/ioannislivieris/Grad_CAM_Siamese.git.
Paper Structure (11 sections, 3 equations, 7 figures, 1 table)

This paper contains 11 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Architecture of the proposed framework
  • Figure 2: Application of the proposed framework on flowers dataset (a) Original input image$_1$ (b) Factual explanations on image$_1$ (c) Counterfactual explanations on image$_1$ (d) Original input image$_2$ (e) Factual explanations on image$_2$ (f) Counterfactual explanations on image$_2$
  • Figure 3: Application of the proposed framework on skin cancer dataset (a) Original input image$_1$ (b) Factual explanations on image$_1$ (c) Counterfactual explanations on image$_1$ (d) Original input image$_2$ (e) Factual explanations on image$_2$ (f) Counterfactual explanations on image$_2$
  • Figure 4: Application of the proposed framework on AirBnB dataset (a) Original input image$_1$ (b) Factual explanations on image$_1$ (c) Counterfactual explanations on image$_1$ (d) Original input image$_2$ (e) Factual explanations on image$_2$ (f) Counterfactual explanations on image$_2$
  • Figure 5: (a) Original image (b) Bounding box (c) Cropped image
  • ...and 2 more figures