Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models
Karan Jain, Mohammad Nayeem Teli
TL;DR
The paper tackles the memory and energy demands of diffusion models for high-dimensional medical data. It introduces the Invertible Diffusion Model (IDM), a hybrid architecture that combines invertible U-Nets with invertible attention within a diffusion framework to enable memory-efficient training on a single GPU. On BraTS2020 3D MRI data, IDM achieves up to about 15% peak memory reduction while delivering competitive PSNR and MAE, albeit with some trade-offs in SSIM, and demonstrates favorable energy efficiency despite higher FLOPs. This work advances scalable, sustainable diffusion modeling for 3D medical imaging and opens pathways to deeper, more expressive invertible architectures in high-dimensional spaces.
Abstract
Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU memory-efficient training for diffusion models for high dimensional medical datasets. The proposed model is built by using an invertible UNet architecture with invertible attention modules. This leads to the following two contributions: 1. denoising diffusion models and thus enabling memory usage to be independent of the dimensionality of the dataset, and 2. reducing the energy usage during training. While this new model can be applied to a multitude of image generation tasks, we showcase its memory-efficiency on the 3D BraTS2020 dataset leading to up to 15\% decrease in peak memory consumption during training with comparable results to SOTA while maintaining the image quality.
