Table of Contents
Fetching ...

Rethinking Perceptual Metrics for Medical Image Translation

Nicholas Konz, Yuwen Chen, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski

TL;DR

The paper questions the suitability of common perceptual metrics for evaluating medical image translation and demonstrates poor correlation with segmentation-based measures across two clinically relevant tasks. By comparing four translation models and a segmentation-conditioned variant, it shows that FID and related metrics often fail to reflect anatomical fidelity, with SWD offering limited utility for intra-modality translation. The work highlights the need for task- and anatomy-aware evaluation metrics and cautions against relying on FID for model selection in medical image translation. It calls for further research to develop metrics that better capture anatomical integrity and downstream utility in clinical contexts.

Abstract

Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in general computer vision. In this paper, we investigate evaluation metrics for medical image translation on two medical image translation tasks (GE breast MRI to Siemens breast MRI and lumbar spine MRI to CT), tested on various state-of-the-art translation methods. We show that perceptual metrics do not generally correlate with segmentation metrics due to them extending poorly to the anatomical constraints of this sub-field, with FID being especially inconsistent. However, we find that the lesser-used pixel-level SWD metric may be useful for subtle intra-modality translation. Our results demonstrate the need for further research into helpful metrics for medical image translation.

Rethinking Perceptual Metrics for Medical Image Translation

TL;DR

The paper questions the suitability of common perceptual metrics for evaluating medical image translation and demonstrates poor correlation with segmentation-based measures across two clinically relevant tasks. By comparing four translation models and a segmentation-conditioned variant, it shows that FID and related metrics often fail to reflect anatomical fidelity, with SWD offering limited utility for intra-modality translation. The work highlights the need for task- and anatomy-aware evaluation metrics and cautions against relying on FID for model selection in medical image translation. It calls for further research to develop metrics that better capture anatomical integrity and downstream utility in clinical contexts.

Abstract

Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in general computer vision. In this paper, we investigate evaluation metrics for medical image translation on two medical image translation tasks (GE breast MRI to Siemens breast MRI and lumbar spine MRI to CT), tested on various state-of-the-art translation methods. We show that perceptual metrics do not generally correlate with segmentation metrics due to them extending poorly to the anatomical constraints of this sub-field, with FID being especially inconsistent. However, we find that the lesser-used pixel-level SWD metric may be useful for subtle intra-modality translation. Our results demonstrate the need for further research into helpful metrics for medical image translation.
Paper Structure (7 sections, 2 figures, 1 table)

This paper contains 7 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example translations for each model.
  • Figure 2: Absolute correlation of perceptual metrics with segmentation metrics.