Table of Contents
Fetching ...

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz

TL;DR

The paper tackles the scarcity of ground-truth reasoning annotations in medical image AI by introducing FunnyNodules, a fully parameterized synthetic dataset with six controllable visual attributes and an attribute-based target rule, plus per-sample ROI masks. This framework enables systematic, model-agnostic evaluation of attribute reasoning, explanation trustworthiness, and prototype-based explanations, all under controlled, scalable conditions. It defines metrics such as the Within-1-Accuracy and a Trust Index to quantify alignment between target predictions and attribute explanations, and demonstrates how attribute-level attention and prototypes can be assessed. While not a substitute for real patient data, FunnyNodules provides a versatile, reproducible platform to study reasoning, robustness, and explanation quality in medical AI and can be extended to explore background effects, imbalance, and model-specific behaviors.

Abstract

Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract, lung nodule-like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. Target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute-target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.

FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

TL;DR

The paper tackles the scarcity of ground-truth reasoning annotations in medical image AI by introducing FunnyNodules, a fully parameterized synthetic dataset with six controllable visual attributes and an attribute-based target rule, plus per-sample ROI masks. This framework enables systematic, model-agnostic evaluation of attribute reasoning, explanation trustworthiness, and prototype-based explanations, all under controlled, scalable conditions. It defines metrics such as the Within-1-Accuracy and a Trust Index to quantify alignment between target predictions and attribute explanations, and demonstrates how attribute-level attention and prototypes can be assessed. While not a substitute for real patient data, FunnyNodules provides a versatile, reproducible platform to study reasoning, robustness, and explanation quality in medical AI and can be extended to explore background effects, imbalance, and model-specific behaviors.

Abstract

Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract, lung nodule-like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. Target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute-target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.

Paper Structure

This paper contains 15 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The controlled generative framework allows the generation of images differing in exactly one attribute, which facilitates analyzing how attribute changes influence the target class (blue line).
  • Figure 2: The sensitivity of the models’ target predictions to varying attributes reflects whether the target rule was captured correctly, which is mostly the case except for the complex notion of roundness.
  • Figure 3: FunnyNodules allows in-depth evaluation of complex decision rules, such as correlated attributes. For example, the effect of roundness on the target depends on the presence of an internal structure. This conditional relation was captured correctly only for one value of internal structure = 0, indicating a general weakness in handling correlated rules across all tested models.
  • Figure 4: Attribute ROIs Ground-truth masks are being created during image generation and enable evaluation of attention in attribute prediction.
  • Figure 5: Histogram of 500 randomly generated FunnyNodules images.