Table of Contents
Fetching ...

Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection

Jian Shi, Pengyi Zhang, Ni Zhang, Hakim Ghazzai, Peter Wonka

TL;DR

DIA tackles the challenge of detecting subtle, fine-grained anomalies in medical images by pairing a dissolving mechanism based on diffusion models with an amplifying, self-supervised contrastive framework. The dissolving transformations suppress instance-specific fine-grained features, while the amplifying framework, through a multi-branch contrastive loss and an auxiliary shift classifier, emphasizes those features to improve discriminability. The method demonstrates substantial improvements over baselines across six medical datasets, achieving state-of-the-art performance on fine-grained anomaly detection without requiring labeled anomalous data. This approach offers a data-domain-aware pathway to robust medical anomaly detection with potential for extension to supervised settings and finer-grained tasks, albeit at the cost of training diffusion models per dataset.

Abstract

Medical imaging often contains critical fine-grained features, such as tumors or hemorrhages, crucial for diagnosis yet potentially too subtle for detection with conventional methods. In this paper, we introduce \textit{DIA}, dissolving is amplifying. DIA is a fine-grained anomaly detection framework for medical images. First, we introduce \textit{dissolving transformations}. We employ diffusion with a generative diffusion model as a dedicated feature-aware denoiser. Applying diffusion to medical images in a certain manner can remove or diminish fine-grained discriminative features. Second, we introduce an \textit{amplifying framework} based on contrastive learning to learn a semantically meaningful representation of medical images in a self-supervised manner, with a focus on fine-grained features. The amplifying framework contrasts additional pairs of images with and without dissolving transformations applied and thereby emphasizes the dissolved fine-grained features. DIA significantly improves the medical anomaly detection performance with around 18.40\% AUC boost against the baseline method and achieves an overall SOTA against other benchmark methods. Our code is available at \url{https://github.com/shijianjian/DIA.git}.

Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection

TL;DR

DIA tackles the challenge of detecting subtle, fine-grained anomalies in medical images by pairing a dissolving mechanism based on diffusion models with an amplifying, self-supervised contrastive framework. The dissolving transformations suppress instance-specific fine-grained features, while the amplifying framework, through a multi-branch contrastive loss and an auxiliary shift classifier, emphasizes those features to improve discriminability. The method demonstrates substantial improvements over baselines across six medical datasets, achieving state-of-the-art performance on fine-grained anomaly detection without requiring labeled anomalous data. This approach offers a data-domain-aware pathway to robust medical anomaly detection with potential for extension to supervised settings and finer-grained tasks, albeit at the cost of training diffusion models per dataset.

Abstract

Medical imaging often contains critical fine-grained features, such as tumors or hemorrhages, crucial for diagnosis yet potentially too subtle for detection with conventional methods. In this paper, we introduce \textit{DIA}, dissolving is amplifying. DIA is a fine-grained anomaly detection framework for medical images. First, we introduce \textit{dissolving transformations}. We employ diffusion with a generative diffusion model as a dedicated feature-aware denoiser. Applying diffusion to medical images in a certain manner can remove or diminish fine-grained discriminative features. Second, we introduce an \textit{amplifying framework} based on contrastive learning to learn a semantically meaningful representation of medical images in a self-supervised manner, with a focus on fine-grained features. The amplifying framework contrasts additional pairs of images with and without dissolving transformations applied and thereby emphasizes the dissolved fine-grained features. DIA significantly improves the medical anomaly detection performance with around 18.40\% AUC boost against the baseline method and achieves an overall SOTA against other benchmark methods. Our code is available at \url{https://github.com/shijianjian/DIA.git}.
Paper Structure (33 sections, 9 equations, 11 figures, 14 tables)

This paper contains 33 sections, 9 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Dissolving Transformations. \ref{['fig:sfig1.5', 'fig:sfig2', 'fig:sfig3', 'fig:sfig4']} show how the fine-grained features are dissolved (removed or suppressed). This effect is stronger as the time step $t$ is increased from left to right. In the extreme case, in \ref{['fig:sfig4']}, different input images become very similar or almost identical depending on the dataset. We show results for four datasets from top to bottom.
  • Figure 2: An overview of the DIA framework as applied to the Kvasir-polyp dataset. (I) With a pretrained diffusion model, we perform feature-aware dissolving transformations on an image $x$. This process estimates the denoised version $x_0$ of $x$ at a given time step $t$, resulting in a feature-dissolved image $\hat{x}$. As $t$ increases, $\hat{x}$ progressively loses its fine-grained discriminative features, highlighting the dissolving effect of removing discriminative image features. (II) Given images, we generate transformed versions with augmentations and dissolving transformation. We form positive and negative pairs as described in \ref{['sec:con_loss']}. Our framework particularly learns fine-grained features by contrasting between original images and their feature-dissolved counterparts.
  • Figure 3: Visualization of the target similarity matrix ($K=2$ with two samples in a batch). The white, blue, and lavender blocks denote the excluded, positive, and negative pairs, respectively. The red area contains the newly introduced negative pairs with dissolving transformations.
  • Figure 4: Dissolving Transformations using different diffusion models. $C$ and $M$ denote if the dissolving transformation is performed based on the diffusion models trained on CIFAR10 or the corresponding dataset, respectively.
  • Figure 5: Heuristic alternatives to dissolving transformations with various kernel sizes. Compared with median blur, Gaussian blur preserves more image semantics.
  • ...and 6 more figures