Table of Contents
Fetching ...

Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification

Vanshali Sharma, Debesh Jha, M. K. Bhuyan, Pradip K. Das, Ulas Bagci

TL;DR

A novel model is developed, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality, and introduces cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation.

Abstract

Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging techniques, including narrow band imaging and white light imaging. However, the existing classification techniques mainly rely on a single imaging modality and show limited performance due to data scarcity. Recently, generative artificial intelligence has been gaining prominence in overcoming such issues. Additionally, various generation-controlling mechanisms using text prompts and images have been introduced to obtain visually appealing and desired outcomes. However, such mechanisms require class labels to make the model respond efficiently to the provided control input. In the colonoscopy domain, such controlling mechanisms are rarely explored; specifically, the text prompt is a completely uninvestigated area. Moreover, the unavailability of expensive class-wise labels for diverse sets of images limits such explorations. Therefore, we develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality. We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation. The experimental results report an improvement of up to 7.91% in balanced accuracy using a publicly available dataset. Moreover, cross-class label learning achieves a statistically significant improvement of up to 18.33% in balanced accuracy during video-level analysis. The code is available at https://github.com/Vanshali/PathoPolyp-Diff.

Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification

TL;DR

A novel model is developed, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality, and introduces cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation.

Abstract

Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging techniques, including narrow band imaging and white light imaging. However, the existing classification techniques mainly rely on a single imaging modality and show limited performance due to data scarcity. Recently, generative artificial intelligence has been gaining prominence in overcoming such issues. Additionally, various generation-controlling mechanisms using text prompts and images have been introduced to obtain visually appealing and desired outcomes. However, such mechanisms require class labels to make the model respond efficiently to the provided control input. In the colonoscopy domain, such controlling mechanisms are rarely explored; specifically, the text prompt is a completely uninvestigated area. Moreover, the unavailability of expensive class-wise labels for diverse sets of images limits such explorations. Therefore, we develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality. We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation. The experimental results report an improvement of up to 7.91% in balanced accuracy using a publicly available dataset. Moreover, cross-class label learning achieves a statistically significant improvement of up to 18.33% in balanced accuracy during video-level analysis. The code is available at https://github.com/Vanshali/PathoPolyp-Diff.

Paper Structure

This paper contains 19 sections, 5 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Sample images depicting (a) adenomatous polyp in WLI (b) hyperplastic polyp in WLI, (c) adenomatous polyp in NBI, (d) hyperplastic polyp in NBI, (e)-(h) represent low-quality frames with artefacts, namely, ghost colors, motion blur, low illumination, and fecal depositions, respectively, and (i) shows a good-quality polyp image.
  • Figure 2: Overview of the proposed framework. It consists of two stages and uses various text conditioning to control the generation process. In Stage-II, some undersampled data from Stage-I is used for a smoother learning process. Also, the first block of U-Net is kept locked in the second stage. The performance of the proposed model is validated using a classification process which uses a combination of real and synthetic images in different proportions.
  • Figure 3: Flowchart depicting the different combinations of text prompt and cross-class labels used to generate images. The yellow nodes represent levels 1-3, while the nodes in another color represent levels 4-5. The solid arrows denote the labels already present in the dataset, whereas the dashed arrows represent the labels learnt from other classes (cross-class labels). Each number on a solid/dashed line represents the combination of strings used to form tokens for text prompts used in training/inference. For instance, following number '8', we obtain the text prompt "colonoscopy image with a hyperplastic polyp, narrow band imaging, good quality, clear", where "good quality, clear" are part of indirectly inferred tokens and other are already present in the training annotations.
  • Figure 4: Iteration-wise two-dimensional t-SNE embeddings to visualize the data points pertaining to synthetic and real polyp/non-polyp images.
  • Figure 5: Sample generated images from iterations (a) 1K polyp, (b) 1K non-polyp, (c) 8K polyp, (d) 8K non-polyp, (e) 10K polyp, and (f) 10K non-polyp.
  • ...and 7 more figures