Table of Contents
Fetching ...

MedCycle: Unpaired Medical Report Generation via Cycle-Consistency

Elad Hirsch, Gefen Dawidowicz, Ayellet Tal

TL;DR

This work introduces an innovative approach that eliminates the need for consistent labeling schemas, thereby enhancing data accessibility and enabling the use of incompatible datasets, and facilitates the learning of effective mapping functions, resulting in the generation of coherent reports.

Abstract

Generating medical reports for X-ray images presents a significant challenge, particularly in unpaired scenarios where access to paired image-report data for training is unavailable. Previous works have typically learned a joint embedding space for images and reports, necessitating a specific labeling schema for both. We introduce an innovative approach that eliminates the need for consistent labeling schemas, thereby enhancing data accessibility and enabling the use of incompatible datasets. This approach is based on cycle-consistent mapping functions that transform image embeddings into report embeddings, coupled with report auto-encoding for medical report generation. Our model and objectives consider intricate local details and the overarching semantic context within images and reports. This approach facilitates the learning of effective mapping functions, resulting in the generation of coherent reports. It outperforms state-of-the-art results in unpaired chest X-ray report generation, demonstrating improvements in both language and clinical metrics.

MedCycle: Unpaired Medical Report Generation via Cycle-Consistency

TL;DR

This work introduces an innovative approach that eliminates the need for consistent labeling schemas, thereby enhancing data accessibility and enabling the use of incompatible datasets, and facilitates the learning of effective mapping functions, resulting in the generation of coherent reports.

Abstract

Generating medical reports for X-ray images presents a significant challenge, particularly in unpaired scenarios where access to paired image-report data for training is unavailable. Previous works have typically learned a joint embedding space for images and reports, necessitating a specific labeling schema for both. We introduce an innovative approach that eliminates the need for consistent labeling schemas, thereby enhancing data accessibility and enabling the use of incompatible datasets. This approach is based on cycle-consistent mapping functions that transform image embeddings into report embeddings, coupled with report auto-encoding for medical report generation. Our model and objectives consider intricate local details and the overarching semantic context within images and reports. This approach facilitates the learning of effective mapping functions, resulting in the generation of coherent reports. It outperforms state-of-the-art results in unpaired chest X-ray report generation, demonstrating improvements in both language and clinical metrics.
Paper Structure (11 sections, 5 equations, 4 figures, 4 tables)

This paper contains 11 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Unpaired medical report generation. (a) Two unpaired datasets are available: chest X-ray images and chest X-ray reports. (b) Our model learns cycle-consistent mappings between image and report embedding spaces ($I2R$ & $R2I$), facilitated by cross-modality alignment through the use of pseudo-reports, as well as report auto-encoding. Report generation is executed by decoding transformed image representations into reports during inference.
  • Figure 2: Method. For each image $i$ from a dataset $\mathcal{D}_I$, a preprocessing step generates a corresponding pseudo-report denoted as $\phi(i)$ (a), which conveys essential image information in textual form. An image encoder encodes each image $i$ into $z_i$ (b1). Simultaneously, a report encoder encodes reports from a report dataset $\mathcal{D}_R$ (b2), as well as pseudo-reports. These encoded representations comprise both local and aggregated global features, by employing self-attention $SA$. Two mapping functions are trained to transform image representations into report representations ($I2R$), and vice versa ($R2I$) (c) . Subsequently, a decoder (d) utilizes the encoded reports (excluding pseudo-reports) to output a report, aiming to reconstruct the initial report. For improved generalization, dropout masks a portion of the local representations. During inference, an input image is encoded (b1), followed by mapping to the report space (c). The transformed representation is then decoded to generate a report (d).
  • Figure 3: Training objectives. Our training involves four distinct objectives, each corresponding to a different loss. The auto-encoding loss (yellow) focuses on accurately reconstructing the input report. The cycle loss (violet) ensures cycle consistency in the $I2R$ and $R2I$ mappings. The adversarial loss (red) ensures that the representations exhibit the same distribution before and after the mapping. Lastly, the cross-modal loss (orange) aims to constrain the mapping by ensuring that pseudo-reports containing information related to an input image and the corresponding image have similar global representations.
  • Figure 4: Qualitative evaluation. Our model-generated report (c) contains similar information to the ground-truth report (b). It indicates the lung clarity , the cardiomediastinal silhouette's state, the appearance of median sternotomy wires , and rules out pleural effusion & pneumothorax .