Table of Contents
Fetching ...

Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection

YeongHyeon Park, Myung Jin Kim, Hyeong Seok Kim

TL;DR

This work tackles the challenge of false positives in medical anomaly detection with general-purpose visual-language models. It introduces Contrastive Language Prompting (CLAP), which uses positive prompts to guide attention toward potential lesions and negative prompts to suppress normal regions, producing an attention map $A_{CLAP} = A_{positive} - A_{negative}$. The approach is paired with a reconstruction-by-inpainting unsupervised anomaly detector that obfuscates high-attention regions and evaluates reconstruction error via MSGMS to decide disease presence. Experiments on the BMAD dataset show CLAP improves anomaly detection across multiple anatomies and outperforms baselines like DINO and PLP, with particular strength on small or irregular lesions; future work includes automating language-prompt generation for practical clinical deployment.

Abstract

A pre-trained visual-language model, contrastive language-image pre-training (CLIP), successfully accomplishes various downstream tasks with text prompts, such as finding images or localizing regions within the image. Despite CLIP's strong multi-modal data capabilities, it remains limited in specialized environments, such as medical applications. For this purpose, many CLIP variants-i.e., BioMedCLIP, and MedCLIP-SAMv2-have emerged, but false positives related to normal regions persist. Thus, we aim to present a simple yet important goal of reducing false positives in medical anomaly detection. We introduce a Contrastive LAnguage Prompting (CLAP) method that leverages both positive and negative text prompts. This straightforward approach identifies potential lesion regions by visual attention to the positive prompts in the given image. To reduce false positives, we attenuate attention on normal regions using negative prompts. Extensive experiments with the BMAD dataset, including six biomedical benchmarks, demonstrate that CLAP method enhances anomaly detection performance. Our future plans include developing an automated fine prompting method for more practical usage.

Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection

TL;DR

This work tackles the challenge of false positives in medical anomaly detection with general-purpose visual-language models. It introduces Contrastive Language Prompting (CLAP), which uses positive prompts to guide attention toward potential lesions and negative prompts to suppress normal regions, producing an attention map . The approach is paired with a reconstruction-by-inpainting unsupervised anomaly detector that obfuscates high-attention regions and evaluates reconstruction error via MSGMS to decide disease presence. Experiments on the BMAD dataset show CLAP improves anomaly detection across multiple anatomies and outperforms baselines like DINO and PLP, with particular strength on small or irregular lesions; future work includes automating language-prompt generation for practical clinical deployment.

Abstract

A pre-trained visual-language model, contrastive language-image pre-training (CLIP), successfully accomplishes various downstream tasks with text prompts, such as finding images or localizing regions within the image. Despite CLIP's strong multi-modal data capabilities, it remains limited in specialized environments, such as medical applications. For this purpose, many CLIP variants-i.e., BioMedCLIP, and MedCLIP-SAMv2-have emerged, but false positives related to normal regions persist. Thus, we aim to present a simple yet important goal of reducing false positives in medical anomaly detection. We introduce a Contrastive LAnguage Prompting (CLAP) method that leverages both positive and negative text prompts. This straightforward approach identifies potential lesion regions by visual attention to the positive prompts in the given image. To reduce false positives, we attenuate attention on normal regions using negative prompts. Extensive experiments with the BMAD dataset, including six biomedical benchmarks, demonstrate that CLAP method enhances anomaly detection performance. Our future plans include developing an automated fine prompting method for more practical usage.

Paper Structure

This paper contains 10 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Generated attention maps by leveraging BiomedCLIP zhang2023biomedclip. $A_{\textit{positive}}$ and $A_{\textit{negative}}$ are the attention maps obtained using positive or negative prompts only. $A_{\textit{CLAP}}$ shows results of our proposal, dubbed Contrastive LAnguage Prompting (CLAP). CLAP leverages both positive and negative prompts. The negative prompts are used to attenuate false positive attention of normal regions.
  • Figure 2: Schematic diagram of our method. Existing positive prompt methods only utilize positive prompts. In this situation, the false positive attention issue remains. In comparison, our method CLAP successfully suppresses false positives by additionally exploiting negative prompts, shown in (a). After getting the attention map of CLAP, we employ the existing UAD model EAR park2024visual. We only replace the saliency map for mosaic obfuscation with an attention map from CLAP, shown in (b).
  • Figure 3: Attention results of visual-only model and visual-language model. The visual-only model, DINO caron2021emerging, performs effective visual saliency attention in the ordinary domain but shows short in the medical domain. When applying our method CLAP on the visual-language model BiomedCLIP zhang2023biomedclip, false attentions are successfully removed.