Table of Contents
Fetching ...

CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing

Jianfei Li, Ines Rosellon-Inclan, Gitta Kutyniok, Jean-Luc Starck

TL;DR

A novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner is introduced.

Abstract

U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models. Our method, termed the Conformal Hallucination Estimation Metric (CHEM), is applicable to any image reconstruction model, enabling efficient identification and quantification of hallucination artifacts. It offers two key advantages: it leverages wavelet and shearlet representations to efficiently extract hallucinations of image features and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner. Furthermore, from an approximation theoretical perspective, we explore the reasons why U-shaped networks are prone to hallucinations. We test the proposed approach on the CANDELS astronomical image dataset with models such as U-Net, SwinUNet, and Learnlets, and provide new perspectives on hallucination from different aspects in deep learning-based image processing.

CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing

TL;DR

A novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner is introduced.

Abstract

U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models. Our method, termed the Conformal Hallucination Estimation Metric (CHEM), is applicable to any image reconstruction model, enabling efficient identification and quantification of hallucination artifacts. It offers two key advantages: it leverages wavelet and shearlet representations to efficiently extract hallucinations of image features and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner. Furthermore, from an approximation theoretical perspective, we explore the reasons why U-shaped networks are prone to hallucinations. We test the proposed approach on the CANDELS astronomical image dataset with models such as U-Net, SwinUNet, and Learnlets, and provide new perspectives on hallucination from different aspects in deep learning-based image processing.

Paper Structure

This paper contains 2 sections, 13 theorems, 88 equations, 11 figures, 1 table, 1 algorithm.

Table of Contents

  1. Acknowledgement

Key Result

Proposition 3

Let $\theta \in \mathbb{R}_{+}$ and $\mathcal{D}\cup\{ (X_m,Y_m) \}_{m=1}^M$ be independent and identically distributed random variables. Then, with probability at least $1-\delta$, the following inequality holds In addition, under the same condition, the inequality holds with probability at least $1-\delta$.

Figures (11)

  • Figure 1: An example of hallucinations in astronomical image deconvolution obtained from U-Net predictions.
  • Figure 2: U-shaped network architectures. The foundational components, namely the encoder stages, decoder stages, and the bottleneck, can be built from convolutional layers, attention layers, or transformer architectures.
  • Figure 3: Quantifying hallucinations of a U-Net trained with $\ell_2$ loss on astronomical images using different dictionaries. Visually, db8 and shearlets provide a clearer representation of the hallucinations present in the prediction. The reconstructions correspond to the fine-scale coefficients of these dictionaries with high CHEM values. For details, please refer to Section \ref{['sec: Experiments']} (Evaluation Details).
  • Figure 4: Quantifying hallucinations of U-shaped networks trained with different loss functions using db8. The predicted images are gathered in the first row, while the second row displays CHEM concerning $\operatorname{H}^{\theta}(\Phi)_j$ for high-resolution coefficients relative to $j$. The U-Net trained with $\ell_2$ loss exhibits distinct texture hallucinations, which are identified by CHEM. In contrast, SUNet and Learnlets do not display noticeable hallucinations, consistent with the results in Figures \ref{['fig:Varying Dictionaries: Pyramids']}, \ref{['fig:FWHM-analysis']}, \ref{['fig:MSE vs Hallucination Index']}. For more information, please see Section \ref{['sec: Experiments']}.
  • Figure 5: MSE/CHEM-FWHM curves under different dictionaries. This figure illustrates the effect of the chosen representation.
  • ...and 6 more figures

Theorems & Definitions (31)

  • Definition 1
  • Remark 2
  • Proposition 3
  • Remark 4
  • Theorem 7
  • Theorem 8
  • proof : Proof of Proposition \ref{['prop:hoeffding']}
  • Definition 9: Multichannel Convolutional Blocks
  • Definition 10: U-shaped networks
  • Remark 11
  • ...and 21 more