Table of Contents
Fetching ...

Extremal Contours: Gradient-driven contours for compact visual attribution

Reza Karimzadeh, Albert Alonso, Frans Zdyb, Julius B. Kirkegaard, Bulat Ibragimov

TL;DR

This work introduces a training-free explanation method that replaces dense perturbation masks with smooth, single, star-convex contours parameterized by a truncated Fourier series. The approach optimizes an extremal preserve/delete objective using classifier gradients, with adaptive area and spectral regularization to ensure compact, stable, and topologically simple explanations. It achieves competitive fidelity with substantially reduced parameter counts, provides explicit area control for fidelity–area analysis, and extends naturally to multiple contours for multi-object attribution, showing strong robustness and favorable performance on both supervised and self-supervised vision models. The framework offers a practical, interpretable alternative to dense masks, with clear pathways to medical imaging applications and future enhancements for more complex topologies.

Abstract

Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.

Extremal Contours: Gradient-driven contours for compact visual attribution

TL;DR

This work introduces a training-free explanation method that replaces dense perturbation masks with smooth, single, star-convex contours parameterized by a truncated Fourier series. The approach optimizes an extremal preserve/delete objective using classifier gradients, with adaptive area and spectral regularization to ensure compact, stable, and topologically simple explanations. It achieves competitive fidelity with substantially reduced parameter counts, provides explicit area control for fidelity–area analysis, and extends naturally to multiple contours for multi-object attribution, showing strong robustness and favorable performance on both supervised and self-supervised vision models. The framework offers a practical, interpretable alternative to dense masks, with clear pathways to medical imaging applications and future enhancements for more complex topologies.

Abstract

Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.

Paper Structure

This paper contains 19 sections, 13 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of explanation methods on ImageNet validation images for the DINO caron2021emerging model. While gradient-based maps (e.g., Gradient SHAP lundberg2017unified, Integrated Gradients sundararajan2017axiomaticattributiondeepnetworks, Grad-CAM++ selvarajuGradCAMVisualExplanations2020chattopadhay2018grad) and dense perturbation masks (Smooth Mask fong2019understandingdeepnetworksextremal) typically produce diffuse or sometimes fragmented attributions, our parameterization yields a single smooth, simply connected contour (Extremal Contour) that encloses the object of interest, highlighting a different representational paradigm for explainability.
  • Figure 2: Qualitative results on ImageNet images. Each column: input image, our optimized mask (red: initial contour, blue: optimized contour), preserve variant, and deletion variant. Our method highlights compact star-convex regions that preserve predictions while their deletion strongly suppresses them.
  • Figure 3: Robustness of the method. (Top) Red circles denote different initial positions $c$ of the contour, while the blue contour is the final optimized masks, overlapped. (Bottom) Effect of the spectral regularizer on contour complexity (color coded). Large $\lambda_r$ enforces smooth, near-circular masks, while lower values permit higher-frequency modes, yet result in the same location. Each trajectory is optimized independently, though we show them simultaneously for visualization.
  • Figure 4: Area-fidelity trade-off. (Left) Single closed contours at target areas $\alpha^\ast\in\{0.1,\ldots,0.7\}$ (small to large). The combined contours resemble a contour map of the faithfulness based on the available region of the image. (Right) Target class probability as a function of the targeted area $\alpha^\ast$. Solid lines shows the preserved variants whereas dashed lines show the deletion. Dotted lines show the average embedding preservation of randomly sampled circular masks.
  • Figure 5: Examples of multiple contour optimization with $N{=}2$ (top, left/right; bottom left) and $N{=}4$ (bottom right). Optimized contours (blue) adapt to distinct salient objects within the image from the initial contours (red). The method encircles the regions that lead the classification. (bottom right) shows a failure case where the contours are not able to cover the salient objects in the image.
  • ...and 1 more figures