Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

Kun Huang; Xiao Ma; Yuhan Zhang; Na Su; Songtao Yuan; Yong Liu; Qiang Chen; Huazhu Fu

Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu

TL;DR

This work tackles the challenge of generating high-resolution OCT volumes under constrained memory resources. It introduces Cascaded Amortized Latent Diffusion Models (CA-LDM), which combine non-holistic autoencoders (NHAE) to map volumes into a compact latent space and cascaded diffusion processes that first synthesize a global 3D latent representation and then refine high-resolution details via slice-wise diffusion. The method achieves 512^3 volume synthesis, outperforms existing approaches in both global fidelity and fine-grained details, and demonstrates practical benefits for downstream segmentation tasks when augmented with synthetic data. This approach enables scalable, high-fidelity OCT data synthesis that can bolster medical imaging analyses while mitigating data scarcity and privacy concerns.

Abstract

Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty to synthesize high-resolution OCT volumes. In this paper, we introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. First, we propose non-holistic autoencoders to efficiently build a bidirectional mapping between high-resolution volume space and low-resolution latent space. In tandem with autoencoders, we propose cascaded diffusion processes to synthesize high-resolution OCT volumes with a global-to-local refinement process, amortizing the memory and computational demands. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods. Moreover, performance gains on two down-stream fine-grained segmentation tasks demonstrate the benefit of the proposed method in training deep learning models for medical imaging tasks. The code is public available at: https://github.com/nicetomeetu21/CA-LDM.

Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 7 figures, 2 tables)

This paper contains 17 sections, 1 equation, 7 figures, 2 tables.

Introduction
Methodology
Non-holistic Autoencoders
Thumbnail image encoding:
Latent uniaxial super-resolution:
Slice-wise image decoding:
Efficient multi-slice decoder:
Cascaded Diffusion Processes
Experiments and Results
Dataset:
Implementation details:
Metrics:
Comparison experiments:
Ablation study:
Benefits for downstream tasks:
...and 2 more sections

Figures (7)

Figure 1: (a) Overview of the proposed method. The size of images and latent representations are noted. (b) Architecture of the multi-slice decoder. It consists of 2D residual blocks with 3D adaptors of different scales. (c) Detailed architecture of the residual block and the 3D adaptor. $k,c,h,w$ represent the batch size, channels, height and width of a batch of 2D features. $\alpha$ is a learnable mixing factor.
Figure 2: Visual comparison of synthetic OCT. Each sample provides a mean projection of the whole volume and 2D images of intra-slice and inter-slice directions in the middle of the volume.
Figure 3: Peak memory usage during inference time with respect to the resolution of the synthesized volume for each model. These models are standard versions without parameter adjustments.
Figure 4: Projections of the ablation methods. Each group of samples is corresponding the same latent representations synthesized by $Diff_{3D}$.
Figure A1: High-resolution synthetic samples of CA-LDM. The first three rows are corresponding to samples in Fig. 2. The last two rows are samples with obvious pathological features.
...and 2 more figures

Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

TL;DR

Abstract

Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)