Table of Contents
Fetching ...

SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution

Cristhian Forigua, Maria Escobar, Pablo Arbelaez

TL;DR

The paper addresses MRI super-resolution by leveraging volumetric transformers to exploit 3D context in MR volumes. It proposes SuperFormer, a 3D Swin Transformer-based architecture with separate feature and volume embeddings, 3D shifted-window local self-attention, and 3D relative position encoding to reconstruct high-resolution MRIs from low-resolution inputs. Experiments on the Human Connectome Project dataset show that volumetric transformers outperform 3D CNNs and 2D transformer baselines, with multi-domain embeddings delivering additional gains. The authors provide public code, pretrained models, and a medical SR toolbox to accelerate research in 3D MRI super-resolution.

Abstract

This paper presents a novel framework for processing volumetric medical information using Visual Transformers (ViTs). First, We extend the state-of-the-art Swin Transformer model to the 3D medical domain. Second, we propose a new approach for processing volumetric information and encoding position in ViTs for 3D applications. We instantiate the proposed framework and present SuperFormer, a volumetric transformer-based approach for Magnetic Resonance Imaging (MRI) Super-Resolution. Our method leverages the 3D information of the MRI domain and uses a local self-attention mechanism with a 3D relative positional encoding to recover anatomical details. In addition, our approach takes advantage of multi-domain information from volume and feature domains and fuses them to reconstruct the High-Resolution MRI. We perform an extensive validation on the Human Connectome Project dataset and demonstrate the superiority of volumetric transformers over 3D CNN-based methods. Our code and pretrained models are available at https://github.com/BCV-Uniandes/SuperFormer.

SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution

TL;DR

The paper addresses MRI super-resolution by leveraging volumetric transformers to exploit 3D context in MR volumes. It proposes SuperFormer, a 3D Swin Transformer-based architecture with separate feature and volume embeddings, 3D shifted-window local self-attention, and 3D relative position encoding to reconstruct high-resolution MRIs from low-resolution inputs. Experiments on the Human Connectome Project dataset show that volumetric transformers outperform 3D CNNs and 2D transformer baselines, with multi-domain embeddings delivering additional gains. The authors provide public code, pretrained models, and a medical SR toolbox to accelerate research in 3D MRI super-resolution.

Abstract

This paper presents a novel framework for processing volumetric medical information using Visual Transformers (ViTs). First, We extend the state-of-the-art Swin Transformer model to the 3D medical domain. Second, we propose a new approach for processing volumetric information and encoding position in ViTs for 3D applications. We instantiate the proposed framework and present SuperFormer, a volumetric transformer-based approach for Magnetic Resonance Imaging (MRI) Super-Resolution. Our method leverages the 3D information of the MRI domain and uses a local self-attention mechanism with a 3D relative positional encoding to recover anatomical details. In addition, our approach takes advantage of multi-domain information from volume and feature domains and fuses them to reconstruct the High-Resolution MRI. We perform an extensive validation on the Human Connectome Project dataset and demonstrate the superiority of volumetric transformers over 3D CNN-based methods. Our code and pretrained models are available at https://github.com/BCV-Uniandes/SuperFormer.
Paper Structure (10 sections, 2 equations, 3 figures, 2 tables)

This paper contains 10 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of our method. SuperFormer encodes features and volume embeddings for deep feature extraction through volumetric transformers and combines the multi-domain representations to reconstruct the super-resolved volume.
  • Figure 2: 3D Shifted window for computing self-attention in the Deep Feature Extraction for 2$\times$2$\times$2 3D token and 8$\times$8$\times$8 window size.
  • Figure 3: Qualitative comparison of our method against CNN and 2D transformer-based methods on the axial, coronal and sagittal anatomical axes.