Table of Contents
Fetching ...

CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis

Lujun Gui, Chuyang Ye, Tianyi Yan

TL;DR

CAVM addresses the need to synthesize contrast-enhanced brain MRI without gadolinium by recasting the problem as progressive dose escalation. It introduces a decomposition tokenizer and a dose-variant autoregression built on LLaMA-style Transformers with a staircase self-attention mask, coupled with a Swin UNETR–based decoder, to generate $y_{LD}$, $y_{HD}$, and $y_{SD}$ from non-contrast inputs. Training combines autoencoding and image-to-image tasks, followed by autoregression optimization, yielding superior tumor-region synthesis and improved downstream segmentation on BraSyn-2023 compared with state-of-the-art baselines. The approach demonstrates strong potential for safer, dose-guided contrast synthesis with practical clinical impact and sets the stage for exploring output-state sampling in future work.

Abstract

Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied to synthesize virtual contrast-enhanced MRI scans from non-contrast images. However, as this synthesis problem is inherently ill-posed, these methods fall short in producing high-quality results. In this work, we propose Conditional Autoregressive Vision Model (CAVM) for improving the synthesis of contrast-enhanced brain tumor MRI. As the enhancement of image intensity grows with a higher dose of contrast agents, we assume that it is less challenging to synthesize a virtual image with a lower dose, where the difference between the contrast-enhanced and non-contrast images is smaller. Thus, CAVM gradually increases the contrast agent dosage and produces higher-dose images based on previous lower-dose ones until the final desired dose is achieved. Inspired by the resemblance between the gradual dose increase and the Chain-of-Thought approach in natural language processing, CAVM uses an autoregressive strategy with a decomposition tokenizer and a decoder. Specifically, the tokenizer is applied to obtain a more compact image representation for computational efficiency, and it decomposes the image into dose-variant and dose-invariant tokens. Then, a masked self-attention mechanism is developed for autoregression that gradually increases the dose of the virtual image based on the dose-variant tokens. Finally, the updated dose-variant tokens corresponding to the desired dose are decoded together with dose-invariant tokens to produce the final contrast-enhanced MRI.

CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis

TL;DR

CAVM addresses the need to synthesize contrast-enhanced brain MRI without gadolinium by recasting the problem as progressive dose escalation. It introduces a decomposition tokenizer and a dose-variant autoregression built on LLaMA-style Transformers with a staircase self-attention mask, coupled with a Swin UNETR–based decoder, to generate , , and from non-contrast inputs. Training combines autoencoding and image-to-image tasks, followed by autoregression optimization, yielding superior tumor-region synthesis and improved downstream segmentation on BraSyn-2023 compared with state-of-the-art baselines. The approach demonstrates strong potential for safer, dose-guided contrast synthesis with practical clinical impact and sets the stage for exploring output-state sampling in future work.

Abstract

Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied to synthesize virtual contrast-enhanced MRI scans from non-contrast images. However, as this synthesis problem is inherently ill-posed, these methods fall short in producing high-quality results. In this work, we propose Conditional Autoregressive Vision Model (CAVM) for improving the synthesis of contrast-enhanced brain tumor MRI. As the enhancement of image intensity grows with a higher dose of contrast agents, we assume that it is less challenging to synthesize a virtual image with a lower dose, where the difference between the contrast-enhanced and non-contrast images is smaller. Thus, CAVM gradually increases the contrast agent dosage and produces higher-dose images based on previous lower-dose ones until the final desired dose is achieved. Inspired by the resemblance between the gradual dose increase and the Chain-of-Thought approach in natural language processing, CAVM uses an autoregressive strategy with a decomposition tokenizer and a decoder. Specifically, the tokenizer is applied to obtain a more compact image representation for computational efficiency, and it decomposes the image into dose-variant and dose-invariant tokens. Then, a masked self-attention mechanism is developed for autoregression that gradually increases the dose of the virtual image based on the dose-variant tokens. Finally, the updated dose-variant tokens corresponding to the desired dose are decoded together with dose-invariant tokens to produce the final contrast-enhanced MRI.

Paper Structure

This paper contains 15 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The overall architecture of CAVM. Decomposition Tokenizer comprises two encoders located on the left. The Dose-variant Autoregression is implemented by LLaMA-style Transformer and two encoders situated at the bottom right. The decoder, positioned at the top right of the diagram, decodes all output images from image tokens during the autoregressive process.
  • Figure 2: Four examples of real T1Gd images and synthesized results. For CAVM, from left to right the image order is lower-dose, higher-dose, and standard-dose. Note the highlighted tumor region for comparison.