InDeed: Interpretable image deep decomposition with guaranteed generalizability

Sihan Wang; Shangqi Gao; Fuping Wu; Xiahai Zhuang

InDeed: Interpretable image deep decomposition with guaranteed generalizability

Sihan Wang, Shangqi Gao, Fuping Wu, Xiahai Zhuang

TL;DR

InDeed addresses the challenge of interpretable and generalizable deep image decomposition by marrying hierarchical Bayesian modeling with deep inference in a modular architecture. It introduces a three-step framework: hierarchical decomposition of Y into $L,S,N$ with $L=AB^T$, variational inference split into two sub-problems (one closed-form, one learned via a neural network $f_ heta$), and a modular DNN that mirrors the probabilistic graph. A PAC-Bayesian generalization bound motivates test-time adaptation (InDeedAG/InDeedOAG), enabling robust performance under distribution shifts, demonstrated on image denoising and unsupervised anomaly detection with strong OOD generalization and interpretable intermediate outputs. Key contributions include closed-form leaf updates, a VI-based objective with interpretable decomposed terms, a modular architecture aligned with the HB model, and an active generalization framework that adapts efficiently at test time. The work provides practical impact for deploying interpretable, generalizable image decomposition models in real-world tasks such as denoising and defect detection, with a clear pathway for extending to other decomposition priors and transfer across tasks.

Abstract

Image decomposition aims to analyze an image into elementary components, which is essential for numerous downstream tasks and also by nature provides certain interpretability to the analysis. Deep learning can be powerful for such tasks, but surprisingly their combination with a focus on interpretability and generalizability is rarely explored. In this work, we introduce a novel framework for interpretable deep image decomposition, combining hierarchical Bayesian modeling and deep learning to create an architecture-modularized and model-generalizable deep neural network (DNN). The proposed framework includes three steps: (1) hierarchical Bayesian modeling of image decomposition, (2) transforming the inference problem into optimization tasks, and (3) deep inference via a modularized Bayesian DNN. We further establish a theoretical connection between the loss function and the generalization error bound, which inspires a new test-time adaptation approach for out-of-distribution scenarios. We instantiated the application using two downstream tasks, \textit{i.e.}, image denoising, and unsupervised anomaly detection, and the results demonstrated improved generalizability as well as interpretability of our methods. The source code will be released upon the acceptance of this paper.

InDeed: Interpretable image deep decomposition with guaranteed generalizability

TL;DR

with

, variational inference split into two sub-problems (one closed-form, one learned via a neural network

), and a modular DNN that mirrors the probabilistic graph. A PAC-Bayesian generalization bound motivates test-time adaptation (InDeedAG/InDeedOAG), enabling robust performance under distribution shifts, demonstrated on image denoising and unsupervised anomaly detection with strong OOD generalization and interpretable intermediate outputs. Key contributions include closed-form leaf updates, a VI-based objective with interpretable decomposed terms, a modular architecture aligned with the HB model, and an active generalization framework that adapts efficiently at test time. The work provides practical impact for deploying interpretable, generalizable image decomposition models in real-world tasks such as denoising and defect detection, with a clear pathway for extending to other decomposition priors and transfer across tasks.

Abstract

Paper Structure (36 sections, 1 theorem, 44 equations, 9 figures, 7 tables)

This paper contains 36 sections, 1 theorem, 44 equations, 9 figures, 7 tables.

Introduction
Related works
Image decomposition
Image denoising
Unsupervised anomaly detection
Methodology
The three-step framework: A summary
Modeling of image decomposition
Formulating inference into two sub-problems
Deep inference via neural network $f_\theta$
Interpretation
Hierarchical Bayesian modeling of decomposition
Convert inference into two optimizations via VI
Closed-form solutions for sub-problem 1
The objective function for sub-problem 2
...and 21 more sections

Key Result

Theorem 1

Let $f_{\theta}$ be any $K_{\theta}$-Lipschitz-continuous function, and $\delta \in (0,1)$. We then have the following inequality hold with probability at least $1-\delta$: Here, $d(p_1,p_2)$ refers to the discrepancy between two distributions $p_1$ and $p_2$, as defined in mbacke2024pac-vae; $C$ is a constant given prior $p(\mathcal{Z})$; $K=c(K_\theta)$ is determined by $K_{\theta}$.

Figures (9)

Figure 1: The proposed three-step framework for establishing architecture-modularized and interpretable DNN. Each subfigure illustrates the corresponding step, with Step 3 being developed under the guidance of Steps 1 and 2. Note that the variable $N$ in (a) is marked with a dotted circle, signifying that no inference is required. Function $\mathcal{F}_{X}(\cdot)$ in (b) and (c) denotes closed-form solution(s) w.r.t. variable(s) $X$.
Figure 2: The PGM and architecture of InDeed. (a) illustrates the PGM with observation variables $\{Y,U\}$ and non-observation $\mathcal{Z}= \{A,B,S,\boldsymbol{\gamma}, \Omega, \Lambda \}$. No inference is required for variables $L$ and $N$ in dotted circles. (b) shows the architecture. Given an observation $Y$, the posteriors of middle-level variables, i.e., $A, B$ and $S$, are first inferred via neural networks, and then the expectations of leaf-level variables, i.e., $\mu_{\boldsymbol{\gamma}}, \mu_{\Omega}$ and $\mu_{\Lambda}$, are given by the closed-form solutions in Eq. \ref{['eq: gamma formula']}, Eq. \ref{['eq: omega formula']}, and Eq. \ref{['eq: lambda formula']}, respectively. The shaded parameters are detached values during training.
Figure 3: Visualization of color image denoising under $\sigma=50$. Three typical examples (cropped patches of size $100 \times 100$, denoted by red boxes) from McMaster. Please zoom in the online electronic version for more details.
Figure 4: Examples from MVTecAD and the anomaly map overlay. Note that the color of original images may look different when they are overlaid with the anomaly maps, due to the mixture of colors. GT: ground truth segmentation map.
Figure 5: Visual comparisons between InDeed and InDeedAG on two cases from SIDD (cropped patches of size $100 \times 100$). Note that "InDeed on SynSIDD" serves as reference, representing predictions using synthetic in-distribution images with AWGN ($\sigma=15$). One can see that InDeedAG achieves better decomposition results than InDeed, particularly evident in noise components, leading to better PSNR and SSIM.
...and 4 more figures

Theorems & Definitions (1)

Theorem 1

InDeed: Interpretable image deep decomposition with guaranteed generalizability

TL;DR

Abstract

InDeed: Interpretable image deep decomposition with guaranteed generalizability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (1)