Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

Mohammed Talha Alam; Raza Imam; Mohammad Areeb Qazi; Asim Ukaye; Karthik Nandakumar

Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

Mohammed Talha Alam, Raza Imam, Mohammad Areeb Qazi, Asim Ukaye, Karthik Nandakumar

TL;DR

The proposed SDICE index measures the distance between the similarity score distributions of original and synthetic images, where the similarity scores are estimated using a pre-trained contrastive encoder, to provide a consistent metric that can be easily compared across domains.

Abstract

Advancements in generative modeling are pushing the state-of-the-art in synthetic medical image generation. These synthetic images can serve as an effective data augmentation method to aid the development of more accurate machine learning models for medical image analysis. While the fidelity of these synthetic images has progressively increased, the diversity of these images is an understudied phenomenon. In this work, we propose the SDICE index, which is based on the characterization of similarity distributions induced by a contrastive encoder. Given a synthetic dataset and a reference dataset of real images, the SDICE index measures the distance between the similarity score distributions of original and synthetic images, where the similarity scores are estimated using a pre-trained contrastive encoder. This distance is then normalized using an exponential function to provide a consistent metric that can be easily compared across domains. Experiments conducted on the MIMIC-chest X-ray and ImageNet datasets demonstrate the effectiveness of SDICE index in assessing synthetic medical dataset diversity.

Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 14 figures, 5 tables)

This paper contains 19 sections, 6 equations, 14 figures, 5 tables.

Introduction
Proposed SDICE Index
Generic SDICE Index
Practical Implementation of SDICE Index
Experimental Results
Diversity evaluation
Comparison with SSIM and FID
Ablation Studies
Impact of number of samples on $\gamma_{intra}$
Impact of different prompts on $\gamma_{intra}$
Parameter Sensitivity Analysis
Conclusion
Supplementary Material
Additional Empirical Studies on the FairFace Dataset
Impact of Feature Extractor
...and 4 more sections

Figures (14)

Figure 1: F-ratio between the similarity score distribution of real and synthetic datasets serves as a good indication of the diversity within the synthetic dataset.
Figure 2: Overview of the proposed SDICE index. We input the real and synthetic dataset to the contrastive pretrained encoder to obtain similarity score distributions. The F-ratio between the two distributions after exponential normalization can be used to assess the diversity of the synthetic dataset.
Figure 3: Qualitative analysis of distribution change across cases
Figure 4: Distribution variation and overlap between real and synthetic samples
Figure 5: Qualitative analysis of distribution differences between the real and synthetic samples in terms of individual classes. (a) and (b) depict the class-wise distributions for the MIMIC-CXR dataset, while (c) and (d) illustrate the same for the 14 classes of ImageNet.
...and 9 more figures

Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

TL;DR

Abstract

Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (14)