Table of Contents
Fetching ...

LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion

Maximilian Mueller, Matthias Hein

TL;DR

This paper significantly improves the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.

Abstract

In realistic medical settings, the data are often inherently long-tailed, with most samples concentrated in a few classes and a long tail of rare classes, usually containing just a few samples. This distribution presents a significant challenge because rare conditions are critical to detect and difficult to classify due to limited data. In this paper, rather than attempting to classify rare classes, we aim to detect these as out-of-distribution data reliably. We leverage low-rank adaption (LoRA) and diffusion guidance to generate targeted synthetic data for the detection problem. We significantly improve the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.

LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion

TL;DR

This paper significantly improves the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.

Abstract

In realistic medical settings, the data are often inherently long-tailed, with most samples concentrated in a few classes and a long tail of rare classes, usually containing just a few samples. This distribution presents a significant challenge because rare conditions are critical to detect and difficult to classify due to limited data. In this paper, rather than attempting to classify rare classes, we aim to detect these as out-of-distribution data reliably. We leverage low-rank adaption (LoRA) and diffusion guidance to generate targeted synthetic data for the detection problem. We significantly improve the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.
Paper Structure (15 sections, 5 figures, 4 tables)

This paper contains 15 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Workflow of LoGex: 1. We train an auxiliary classifier on the long-tailed dataset's head and tail classes. 2. We adapt a general-purpose diffusion model to the histopathology domain by applying LoRA finetuning only on the tail samples. 3. We generate synthetic tail samples with the DiG-IN guidance from Augustin2023AnalyzingAE. 4. We retrain a classifier by adding the synthetically generated tail samples to the train dataset.
  • Figure 2: Guidance matters: We show samples from the train set (first row), samples generated with LoRA (second row), and samples generated with LoGex (third row) and the predictions of a classifier trained on the original dataset. With LoGex the synthetic images are more often classified as the desired class.
  • Figure 3: Ablation on the number of synthetic images per tail class: Adding more than 100 samples leads to a slight increase in FPR, but still outperforms baselines. We hypothesize that the relative importance of the natural tail samples decreases when too many synthetic images are used.
  • Figure 4: Class distribution of the train dataset.
  • Figure 5: Samples from all classes: We show samples from the train dataset (first row), samples generated with LoRA (second row), and generated with LoGex (third row). For each sample, we report the corresponding prediction of a classifier trained on the original dataset from Kriegsmann23Skincancer (achieving a tail accuracy of 96.1%). With LoGex the synthetic images are more often classified as the desired class.