Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

Jun Li; Su Hwan Kim; Philip Müller; Lina Felsner; Daniel Rueckert; Benedikt Wiestler; Julia A. Schnabel; Cosmin I. Bercea

Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

Jun Li, Su Hwan Kim, Philip Müller, Lina Felsner, Daniel Rueckert, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea

TL;DR

This paper tackles the lack of interpretability and generalization in unsupervised anomaly detection for medical imaging by introducing a multi-image VQA-UAD framework that jointly leverages original MRIs, anomaly maps, and pseudo-healthy reconstructions. Central to the approach is the Knowledge Querying Transformer (KQ-Former), a transformer-based module initialized with medical knowledge (BioBERT) that aligns visual and textual modalities even with limited data. The authors provide a new brain MRI VQA-UAD dataset, develop a strong multi-image VQA baseline, and show that KQ-Former improves closed-question accuracy and open-question BLEU-4 while achieving favorable NLI entailment/contradiction metrics; moreover, anomaly maps significantly boost open-set anomaly detection, enhancing LM generalizability to unseen conditions. Overall, the work demonstrates a synergistic benefit: language models make anomaly maps interpretable, and anomaly maps improve the generalizability of language models in open-set medical anomaly detection, with practical implications for clinical decision support.

Abstract

This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model's generalizability to previously unseen medical conditions. The code and dataset are available at https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file

Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

TL;DR

Abstract

Paper Structure (10 sections, 1 equation, 6 figures, 3 tables)

This paper contains 10 sections, 1 equation, 6 figures, 3 tables.

Introduction
Methods
Multi-Image VQA Baseline
Knowledge Q-Former
Experiments
Results
Language Models Enhance the Explainability of Anomaly Maps
Anomaly Maps Improve Generalizability of Language Models
Conclusion
Acknowledgments.

Figures (6)

Figure 1: Category distribution of unseen anomalies. These unseen anomalies are dural thickening, white matter lesion, sinus opacification, encephalomalacia, intraventricular substance, and absent septum pellucidum.
Figure 2: An overview of our novel framework for VQA-UAD: (a) the multi-image VQA baseline; (b) multi-image feature fusion strategies; (c) the KQ-former module.
Figure 2: Visualization examples from different NLI models. In certain instances, different models may have different judgments, indicating that the results may still exhibit some deviations from human recognition. For example, in the first case, the KQF framework predicts "Sulci are unremarkable in size" and the ground truth is "Width of left lateral ventricle is within normal range". The BART and DEBERTA models classify as "Neutral", while mDeBERTa and ROBERTA predict as "Entailment".
Figure 3: Left: Distribution of anomaly categories. Right: Definitions of closed and open questions. For the closed questions, the blue text indicates the answer type, with the count of each type in parentheses. For more details, please refer to the supplementary material. Some questions are simplified here due to space constraints.
Figure 4: Visualization examples of the KQ-Former module with concatenation strategy. Each example includes from left to right: the original image, anomaly map, and PH reconstruction. CQ and OQ represent closed and open questions, respectively.
...and 1 more figures

Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

TL;DR

Abstract

Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

Authors

TL;DR

Abstract

Table of Contents

Figures (6)