Sparks of Explainability: Recent Advancements in Explaining Large Vision Models
Thomas Fel
TL;DR
Sparks of Explainability investigates how to interrogate and align large vision models beyond traditional heatmaps. It integrates gradient-, region-, and concept-based explainability, introducing Sobol-based global sensitivity (for robust, scalable attribution) and EVA for formally guaranteed perturbation-based explanations. A central thread is aligning model explanations with human reasoning via model-alignment (harmonization) and the CRAFT framework for automatic concept extraction, complemented by MACO for high-quality concept visualizations. The FRSign dataset serves as a practical testbed, enabling thorough human-in-the-loop evaluations that reveal when attribution alone suffices and when concept-based explanations are essential. Collectively, the work argues for a unified toolset that combines attribution, concept extraction, and robust visualization to improve trust, interpretability, and safety of large vision models, while offering scalable pathways to apply these ideas to real-world systems and interactive demonstrations like Lens.
Abstract
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.
