Table of Contents
Fetching ...

Resolution Invariant Autoencoder

Ashay Patel, Michela Antonelli, Sebastien Ourselin, M. Jorge Cardoso

TL;DR

The paper tackles resolution variability in medical imaging by introducing a resolution-invariant autoencoder that fixes the latent resolution via learnt variable resampling, paired with a per-layer down/up factor $d_i = (l_i/r_i)^{1/n}$. It further couples a latent consistency loss and an SSIM-based information-loss mechanism to quantify and propagate uncertainty when downsampling, enabling robust performance across arbitrary input and output resolutions. The approach is validated on three tasks—arbitrary-resolution whole-body CT super-resolution, ADNI Alzheimer's classification, and multi-resolution generative modelling with a Latent Diffusion framework—demonstrating competitive results and meaningful uncertainty quantification. By eliminating the need for image pre-processing to normalize resolution and enabling cross-resolution analysis in a single latent space, the method offers practical benefits for multi-resolution medical imaging pipelines and data efficiency, with open-source code provided.

Abstract

Deep learning has significantly advanced medical imaging analysis, yet variations in image resolution remain an overlooked challenge. Most methods address this by resampling images, leading to either information loss or computational inefficiencies. While solutions exist for specific tasks, no unified approach has been proposed. We introduce a resolution-invariant autoencoder that adapts spatial resizing at each layer in the network via a learned variable resizing process, replacing fixed spatial down/upsampling at the traditional factor of 2. This ensures a consistent latent space resolution, regardless of input or output resolution. Our model enables various downstream tasks to be performed on an image latent whilst maintaining performance across different resolutions, overcoming the shortfalls of traditional methods. We demonstrate its effectiveness in uncertainty-aware super-resolution, classification, and generative modelling tasks and show how our method outperforms conventional baselines with minimal performance loss across resolutions.

Resolution Invariant Autoencoder

TL;DR

The paper tackles resolution variability in medical imaging by introducing a resolution-invariant autoencoder that fixes the latent resolution via learnt variable resampling, paired with a per-layer down/up factor . It further couples a latent consistency loss and an SSIM-based information-loss mechanism to quantify and propagate uncertainty when downsampling, enabling robust performance across arbitrary input and output resolutions. The approach is validated on three tasks—arbitrary-resolution whole-body CT super-resolution, ADNI Alzheimer's classification, and multi-resolution generative modelling with a Latent Diffusion framework—demonstrating competitive results and meaningful uncertainty quantification. By eliminating the need for image pre-processing to normalize resolution and enabling cross-resolution analysis in a single latent space, the method offers practical benefits for multi-resolution medical imaging pipelines and data efficiency, with open-source code provided.

Abstract

Deep learning has significantly advanced medical imaging analysis, yet variations in image resolution remain an overlooked challenge. Most methods address this by resampling images, leading to either information loss or computational inefficiencies. While solutions exist for specific tasks, no unified approach has been proposed. We introduce a resolution-invariant autoencoder that adapts spatial resizing at each layer in the network via a learned variable resizing process, replacing fixed spatial down/upsampling at the traditional factor of 2. This ensures a consistent latent space resolution, regardless of input or output resolution. Our model enables various downstream tasks to be performed on an image latent whilst maintaining performance across different resolutions, overcoming the shortfalls of traditional methods. We demonstrate its effectiveness in uncertainty-aware super-resolution, classification, and generative modelling tasks and show how our method outperforms conventional baselines with minimal performance loss across resolutions.

Paper Structure

This paper contains 10 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of (left) AE architecure consisting of 3 layers of residual blocks followed by variable resizing blocks along the encoder and decoder, (right) architecture for learnable variable resizing block
  • Figure 2: Training pipeline indicating flow of data during training of the resolution invariant AE
  • Figure 3: Qualitative results showcasing (left) the original ground truth image, (top) low resolution input image ( resampled to same output size via linear interpolation), (2nd row) super-resolved image using the U-Net approach, (3rd row) super-resolved image using our proposed method, (4th row) the uncertainty for our approach.