Table of Contents
Fetching ...

Interpreting Low-level Vision Models with Causal Effect Maps

Jinfan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong

TL;DR

This work introduces Causal Effect Map (CEM), a model-/task-agnostic framework that usesLV interventions to quantify how input patches causally affect ROI restoration in LV tasks. By comparing outcomes under do- interventions and averaging across interventions, CEM reveals positive and negative causal effects of patches, offering deeper interpretability beyond correlations. Across SR, DN, and DR, CEM shows that larger receptive fields or global mechanisms do not universally improve performance and that multitask training can bias models toward local information, with practical implications for designing general LV systems. The authors also propose an acceleration strategy to make CEM computationally feasible and provide a thorough ablation study validating robustness to datasets, patch sizes, and sampling schemes.

Abstract

Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Effect Map (CEM). With CEM, we can visualize and quantify the input-output relationships on either positive or negative effects. After analyzing various low-level vision tasks with CEM, we have reached several interesting insights, such as: (1) Using more information of input images (e.g., larger receptive field) does NOT always yield positive outcomes. (2) Attempting to incorporate mechanisms with a global receptive field (e.g., channel attention) into image denoising may prove futile. (3) Integrating multiple tasks to train a general model could encourage the network to prioritize local information over global context. Based on the causal effect theory, the proposed diagnostic tool can refresh our common knowledge and bring a deeper understanding of low-level vision models. Codes are available at https://github.com/J-FHu/CEM.

Interpreting Low-level Vision Models with Causal Effect Maps

TL;DR

This work introduces Causal Effect Map (CEM), a model-/task-agnostic framework that usesLV interventions to quantify how input patches causally affect ROI restoration in LV tasks. By comparing outcomes under do- interventions and averaging across interventions, CEM reveals positive and negative causal effects of patches, offering deeper interpretability beyond correlations. Across SR, DN, and DR, CEM shows that larger receptive fields or global mechanisms do not universally improve performance and that multitask training can bias models toward local information, with practical implications for designing general LV systems. The authors also propose an acceleration strategy to make CEM computationally feasible and provide a thorough ablation study validating robustness to datasets, patch sizes, and sampling schemes.

Abstract

Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Effect Map (CEM). With CEM, we can visualize and quantify the input-output relationships on either positive or negative effects. After analyzing various low-level vision tasks with CEM, we have reached several interesting insights, such as: (1) Using more information of input images (e.g., larger receptive field) does NOT always yield positive outcomes. (2) Attempting to incorporate mechanisms with a global receptive field (e.g., channel attention) into image denoising may prove futile. (3) Integrating multiple tasks to train a general model could encourage the network to prioritize local information over global context. Based on the causal effect theory, the proposed diagnostic tool can refresh our common knowledge and bring a deeper understanding of low-level vision models. Codes are available at https://github.com/J-FHu/CEM.
Paper Structure (19 sections, 5 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 5 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: CEMs of SwinIR for the image super-resolution, denoising, and draining tasks. The patches with positive or negative causal effects and the ROI are indicated in red, blue, and green respectively. The pie chart records the percentage of patches with different causal effects, and the colorbar expresses the effect range.
  • Figure 2: (a) Correlations between a, b, c and R, $\mathcal{M}_\textit{R}$ denotes some kind of measurement for R. Intervening in a ($\textit{do}(a=x)$), b ($\textit{do}(b=x)$), and c ($\textit{do}(c=x)$) reveals that (b) variable a has no causal relationship with R, (c) variable b has a negative causal effect, while (d) variable c has a positive causal effect. The black undirected edge represents two variables that correlate. The blue-directed and red-directed edges show the negative and positive causal relationship, respectively.
  • Figure 3: (a) Examining the relationship between input patches a, b, c and the reconstruction of ROI R. (b) The LAM of RNAN, red dots represent the correlation. (c) The CEM of RNAN, patches with negative and positive effect are highlighted in blue and red.
  • Figure 4: The patch in the original image (a) does not exhibit a significant difference compared to the blur intervention in (b). Assigning zero values in (c) causes the patch to deviate from its neighbors.
  • Figure 5: (a) Intervening all the patches for $T$ times to get a CEM. (b) Dividing the patches into two sets $\mathcal{U}$ and $\mathcal{S}$ in the coarse stage. (c) Refining the causal effect of set $\mathcal{S}$ in the fine stage. The patches are sampled according to the probability density $\mathcal{D}$, which is further discussed in Sec. \ref{['A-accel']}.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Definition 4.1: LV Intervention