Table of Contents
Fetching ...

Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling

Abril Corona-Figueroa, Hubert P. H. Shum, Chris G. Willcocks

TL;DR

This work tackles 2D X-ray to 3D CT-like reconstruction under limited data by reframing the task as a 3D-to-3D generative problem. It preserves 2D information by repeating and concatenating $N$ views into a high-channel volume and uses a Swin UNETR-based mapper together with neural optimal transport, regularized by the de-biased Sinkhorn divergence, to maintain fidelity to the inputs without heavy latent encoding. The approach demonstrates strong cross-view correlation, achieving competitive reconstructions when trained on a single dataset and generalizing to six datasets, including out-of-distribution samples, with fast convergence (~$2{,}000$ iterations, ~28 hours). This method is fast, data-efficient, and robust to view variations, offering practical potential for clinical CT reconstruction while acknowledging remaining blur due to intrinsic uncertainty, which could be mitigated by iterative alignment or diffusion-based refinement in future work.

Abstract

This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction. We observe that existing approaches, which integrate information across multiple 2D views in the latent space, lose valuable signal information during latent encoding. Instead, we simply repeat and concatenate the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem, sidestepping several complex modeling issues. This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin UNETR backbone. Our approach applies neural optimal transport, which is fast and stable to train, effectively integrating signal information across multiple views without the requirement for precise alignment; it produces non-collapsed reconstructions that are highly faithful to the 2D views, even after limited training. We demonstrate correlated results, both qualitatively and quantitatively, having trained our model on a single dataset and evaluated its generalization ability across six datasets, including out-of-distribution samples.

Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling

TL;DR

This work tackles 2D X-ray to 3D CT-like reconstruction under limited data by reframing the task as a 3D-to-3D generative problem. It preserves 2D information by repeating and concatenating views into a high-channel volume and uses a Swin UNETR-based mapper together with neural optimal transport, regularized by the de-biased Sinkhorn divergence, to maintain fidelity to the inputs without heavy latent encoding. The approach demonstrates strong cross-view correlation, achieving competitive reconstructions when trained on a single dataset and generalizing to six datasets, including out-of-distribution samples, with fast convergence (~ iterations, ~28 hours). This method is fast, data-efficient, and robust to view variations, offering practical potential for clinical CT reconstruction while acknowledging remaining blur due to intrinsic uncertainty, which could be mitigated by iterative alignment or diffusion-based refinement in future work.

Abstract

This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction. We observe that existing approaches, which integrate information across multiple 2D views in the latent space, lose valuable signal information during latent encoding. Instead, we simply repeat and concatenate the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem, sidestepping several complex modeling issues. This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin UNETR backbone. Our approach applies neural optimal transport, which is fast and stable to train, effectively integrating signal information across multiple views without the requirement for precise alignment; it produces non-collapsed reconstructions that are highly faithful to the 2D views, even after limited training. We demonstrate correlated results, both qualitatively and quantitatively, having trained our model on a single dataset and evaluated its generalization ability across six datasets, including out-of-distribution samples.

Paper Structure

This paper contains 24 sections, 10 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (a) Previous approaches focus on 2D to 3D mapping, often employing asymmetric architectures and compressed latent encoding. (b) In contrast, we propose 3D to 3D mapping from repeated and concatenated inputs, enabling faster training with highly correlated outputs without latent compression, even with small datasets (a few hundred images).
  • Figure 2: Proposed 2D to 3D image translation approach. (a) We learn the mapping between 2D inputs and their corresponding 3D representation adapting a Swin UNe-Transformer $g_\varphi$hatamizadeh2021:swin for image translation. (b) Optimization is based on the dual optimal transport regularization between networks $g_\varphi$ and $d_\phi$, comparing data points from the two image spaces using the Sinkhorn divergence $\mathrm{S}_\mathrm{\varepsilon}$ on the activations of a feature extractor $f$.
  • Figure 3: Effect of reformulating 2D-3D mapping into 3D-3D.
  • Figure 4: Example projections from generated 3D CT volumes from inputs using one, two, four and eight X-rays; obtained from our 3D-3D translation approach with Swin UNETR backbone. We use testing instances from LIDC-IDRI dataset armato2011:lidc.
  • Figure 5: CT projections from generated 3D volumes on various out-of-distribution lung datasets. Model weights were selected from iteration 5,000 on the LIDC-IDRI dataset armato2011:lidc. GT projections are displayed in odd rows, while our model's outputs are shown in even rows.
  • ...and 4 more figures