Table of Contents
Fetching ...

Intuitions of Machine Learning Researchers about Transfer Learning for Medical Image Classification

Yucheng Lu, Hubert Dariusz Zając, Veronika Cheplygina, Amelia Jiménez-Sánchez

TL;DR

This study investigates how machine learning researchers intuitively select source datasets for transfer learning in medical imaging, a process that can impact model generalization and patient outcomes. Using a task-based, mixed-methods survey with 15 practitioners across two medical imaging case studies (H&E tissue classification and CHEXPERT chest X-ray multi-label classification), the authors examine preferences among ImageNet-1K, RadImageNet, and Ecoset as pretraining sources. Findings show that decisions are task-dependent and shaped by community practices, dataset properties, and perceived similarity along semantic, visual, and embedding dimensions, though similarity does not always predict performance. The work highlights ambiguity in terms like domain gap and calls for clearer definitions and HCI tools to make source dataset selection more explicit, rigorous, and transferable to practice in medical imaging transfer learning.

Abstract

Transfer learning is crucial for medical imaging, yet the selection of source datasets - which can impact the generalizability of algorithms, and thus patient outcomes - often relies on researchers' intuition rather than systematic principles. This study investigates these decisions through a task-based survey with machine learning practitioners. Unlike prior work that benchmarks models and experimental setups, we take a human-centered HCI perspective on how practitioners select source datasets. Our findings indicate that choices are task-dependent and influenced by community practices, dataset properties, and computational (data embedding), or perceived visual or semantic similarity. However, similarity ratings and expected performance are not always aligned, challenging a traditional "more similar is better" view. Participants often used ambiguous terminology, which suggests a need for clearer definitions and HCI tools to make them explicit and usable. By clarifying these heuristics, this work provides practical insights for more systematic source selection in transfer learning.

Intuitions of Machine Learning Researchers about Transfer Learning for Medical Image Classification

TL;DR

This study investigates how machine learning researchers intuitively select source datasets for transfer learning in medical imaging, a process that can impact model generalization and patient outcomes. Using a task-based, mixed-methods survey with 15 practitioners across two medical imaging case studies (H&E tissue classification and CHEXPERT chest X-ray multi-label classification), the authors examine preferences among ImageNet-1K, RadImageNet, and Ecoset as pretraining sources. Findings show that decisions are task-dependent and shaped by community practices, dataset properties, and perceived similarity along semantic, visual, and embedding dimensions, though similarity does not always predict performance. The work highlights ambiguity in terms like domain gap and calls for clearer definitions and HCI tools to make source dataset selection more explicit, rigorous, and transferable to practice in medical imaging transfer learning.

Abstract

Transfer learning is crucial for medical imaging, yet the selection of source datasets - which can impact the generalizability of algorithms, and thus patient outcomes - often relies on researchers' intuition rather than systematic principles. This study investigates these decisions through a task-based survey with machine learning practitioners. Unlike prior work that benchmarks models and experimental setups, we take a human-centered HCI perspective on how practitioners select source datasets. Our findings indicate that choices are task-dependent and influenced by community practices, dataset properties, and computational (data embedding), or perceived visual or semantic similarity. However, similarity ratings and expected performance are not always aligned, challenging a traditional "more similar is better" view. Participants often used ambiguous terminology, which suggests a need for clearer definitions and HCI tools to make them explicit and usable. By clarifying these heuristics, this work provides practical insights for more systematic source selection in transfer learning.

Paper Structure

This paper contains 27 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Screenshot of our interactive dataset browser.
  • Figure 2: Areas of research expertise among participants.
  • Figure 3: Participants' willingness to use different source datasets. (a) Case study 1: H&E patch classification. (b) Case study 2: chest X-ray classification.
  • Figure 4: Participants' subjective assessment of the expected fine-tuning performance for each source dataset. (a) Case study 1: H&E patch classification. (b) Case study 2: chest X-ray classification.
  • Figure 5: Ratings of expected pretraining effects for a successful fine-tuning outcome presented by a 5-point scale (1 = very poor, 5 = very good). (a) Case study 1: H&E patch classification. (b) Case study 2: chest X-ray classification.