Table of Contents
Fetching ...

Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion

Xin Li, Kaixiang Yang, Qiang Li, Zhiwei Wang

TL;DR

CA3D-Diff tackles the challenging problem of bidirectional mammogram view translation between CC and MLO projections by fusing a diffusion-based generator with anatomical priors. It introduces a column-aware cross-attention mechanism and an implicit 3D structure reconstruction module that back-projects 2D latents into a coarse 3D volume to guide cross-view synthesis based on projection geometry. The approach achieves superior visual fidelity and structural consistency over state-of-the-art methods on VinDr-Mammo, and downstream screening tasks show that synthesized views can meaningfully augment single-view analyses. This work advances anatomically informed mammography synthesis, with practical implications for missing-view recovery, data augmentation, and cross-view representation learning in CAD systems.

Abstract

Dual-view mammography, including craniocaudal (CC) and mediolateral oblique (MLO) projections, offers complementary anatomical views crucial for breast cancer diagnosis. However, in real-world clinical workflows, one view may be missing, corrupted, or degraded due to acquisition errors or compression artifacts, limiting the effectiveness of downstream analysis. View-to-view translation can help recover missing views and improve lesion alignment. Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections, which obscure pixel-level correspondences. In this paper, we propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework based on conditional diffusion model. To address cross-view structural misalignment, we first design a column-aware cross-attention mechanism that leverages the geometric property that anatomically corresponding regions tend to lie in similar column positions across views. A Gaussian-decayed bias is applied to emphasize local column-wise correlations while suppressing distant mismatches. Furthermore, we introduce an implicit 3D structure reconstruction module that back-projects noisy 2D latents into a coarse 3D feature volume based on breast-view projection geometry. The reconstructed 3D structure is refined and injected into the denoising UNet to guide cross-view generation with enhanced anatomical awareness. Extensive experiments demonstrate that CA3D-Diff achieves superior performance in bidirectional tasks, outperforming state-of-the-art methods in visual fidelity and structural consistency. Furthermore, the synthesized views effectively improve single-view malignancy classification in screening settings, demonstrating the practical value of our method in real-world diagnostics.

Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion

TL;DR

CA3D-Diff tackles the challenging problem of bidirectional mammogram view translation between CC and MLO projections by fusing a diffusion-based generator with anatomical priors. It introduces a column-aware cross-attention mechanism and an implicit 3D structure reconstruction module that back-projects 2D latents into a coarse 3D volume to guide cross-view synthesis based on projection geometry. The approach achieves superior visual fidelity and structural consistency over state-of-the-art methods on VinDr-Mammo, and downstream screening tasks show that synthesized views can meaningfully augment single-view analyses. This work advances anatomically informed mammography synthesis, with practical implications for missing-view recovery, data augmentation, and cross-view representation learning in CAD systems.

Abstract

Dual-view mammography, including craniocaudal (CC) and mediolateral oblique (MLO) projections, offers complementary anatomical views crucial for breast cancer diagnosis. However, in real-world clinical workflows, one view may be missing, corrupted, or degraded due to acquisition errors or compression artifacts, limiting the effectiveness of downstream analysis. View-to-view translation can help recover missing views and improve lesion alignment. Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections, which obscure pixel-level correspondences. In this paper, we propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework based on conditional diffusion model. To address cross-view structural misalignment, we first design a column-aware cross-attention mechanism that leverages the geometric property that anatomically corresponding regions tend to lie in similar column positions across views. A Gaussian-decayed bias is applied to emphasize local column-wise correlations while suppressing distant mismatches. Furthermore, we introduce an implicit 3D structure reconstruction module that back-projects noisy 2D latents into a coarse 3D feature volume based on breast-view projection geometry. The reconstructed 3D structure is refined and injected into the denoising UNet to guide cross-view generation with enhanced anatomical awareness. Extensive experiments demonstrate that CA3D-Diff achieves superior performance in bidirectional tasks, outperforming state-of-the-art methods in visual fidelity and structural consistency. Furthermore, the synthesized views effectively improve single-view malignancy classification in screening settings, demonstrating the practical value of our method in real-world diagnostics.

Paper Structure

This paper contains 21 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of the proposed CA3D-Diff framework for bidirectional mammogram view translation.
  • Figure 2: Schematic illustration of mammogram projection and back-projection. (a) The 3D breast volume is projected from two distinct viewpoints to produce the CC and MLO 2D mammogram images. The indices $i$ and $j$ indicate the positions of two 3D slices that appear as strip-like regions in the 2D projections. (b) The 3D slice is reconstructed by back-projecting the strip regions extracted from both views.
  • Figure 3: Qualitative comparison on the VinDr-Mammo dataset for $CC \rightarrow MLO$. The first and second columns show the input CC and ground-truth MLO images, respectively. Yellow arrows mark anatomical differences, with our CA3D-Diff producing more accurate results compared to other SOTA methods.
  • Figure 4: Qualitative results on the VinDr-Mammo dataset for $MLO \rightarrow CC$ translation. The first column shows input MLO images, and the second column presents the ground-truth CC images. Yellow arrows mark anatomical differences, where our method demonstrates improved structural fidelity.