Table of Contents
Fetching ...

Sparks of Explainability: Recent Advancements in Explaining Large Vision Models

Thomas Fel

TL;DR

Sparks of Explainability investigates how to interrogate and align large vision models beyond traditional heatmaps. It integrates gradient-, region-, and concept-based explainability, introducing Sobol-based global sensitivity (for robust, scalable attribution) and EVA for formally guaranteed perturbation-based explanations. A central thread is aligning model explanations with human reasoning via model-alignment (harmonization) and the CRAFT framework for automatic concept extraction, complemented by MACO for high-quality concept visualizations. The FRSign dataset serves as a practical testbed, enabling thorough human-in-the-loop evaluations that reveal when attribution alone suffices and when concept-based explanations are essential. Collectively, the work argues for a unified toolset that combines attribution, concept extraction, and robust visualization to improve trust, interpretability, and safety of large vision models, while offering scalable pathways to apply these ideas to real-world systems and interactive demonstrations like Lens.

Abstract

This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.

Sparks of Explainability: Recent Advancements in Explaining Large Vision Models

TL;DR

Sparks of Explainability investigates how to interrogate and align large vision models beyond traditional heatmaps. It integrates gradient-, region-, and concept-based explainability, introducing Sobol-based global sensitivity (for robust, scalable attribution) and EVA for formally guaranteed perturbation-based explanations. A central thread is aligning model explanations with human reasoning via model-alignment (harmonization) and the CRAFT framework for automatic concept extraction, complemented by MACO for high-quality concept visualizations. The FRSign dataset serves as a practical testbed, enabling thorough human-in-the-loop evaluations that reveal when attribution alone suffices and when concept-based explanations are essential. Collectively, the work argues for a unified toolset that combines attribution, concept extraction, and robust visualization to improve trust, interpretability, and safety of large vision models, while offering scalable pathways to apply these ideas to real-world systems and interactive demonstrations like Lens.

Abstract

This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.

Paper Structure

This paper contains 320 sections, 19 theorems, 139 equations, 111 figures, 22 tables, 3 algorithms.

Key Result

theorem 1

EVA provide the optimal set from step $|\bm{u}|$ to the last step. With $\bm{u}$ the essential variables of $\bm{\delta}^*$, EVA will rank the $\bm{u}$ variables first and provide the optimal set from the step $|\bm{u}|$ to the last step.

Figures (111)

  • Figure 1: Illustration of the Black-Box Problem. Neural networks undergo training on a Training Dataset through a specific Learning Algorithm. After training, the model performs inferences using the learned parameters to make Predictions. However, the multitude of operations from Input to prediction is excessively complex for human comprehension, thus the name Black-box.
  • Figure 2: Attribution Methods. Attribution methods will be the subject of the \ref{['chap:attributions']}. These methods aim to explain a specific prediction through heatmaps, where hotter areas indicate a greater significance of the pixel for the decision.
  • Figure 3: Concept Activation Vector (CAV). An example of extracting the "striped" concept using images featuring this concept and random images. A classifier in the intermediate space is utilized to identify the CAV as the vector orthogonal to the decision boundary. Methods for analyzing concepts will be discussed in \ref{['chap:concepts']}.
  • Figure 4: Illustration of Feature Visualization (FV). An example of visualization for neurons (ladybug and goldfish), channels of a convolutional network as well as for CAV using FV. Feature Visualizations will be discussed in \ref{['chap:concepts']}.
  • Figure 5: FRSign Dataset Statistics. Distribution of Images per Video Sequence. The FRSign dataset exhibits a general trend of having a modest number of images per sequence, despite the presence of an outlier sequence containing more than 5000 images. Interestingly, there is a notable peak in the distribution, with a significant number of sequences having around 1000 images each.
  • ...and 106 more figures

Theorems & Definitions (57)

  • definition 1: Empirical risk.
  • definition 2: ERM learning algorithm
  • definition 3: SGD
  • definition 4: Neuron
  • definition 5: Fully Connected Feedforward Neural Network (FCNN)
  • definition 6: Convolution Operation
  • definition 7: Residual Connection
  • definition 8: Batch Normalization
  • definition 9
  • definition 10: Attribution Method.
  • ...and 47 more