Resolution Invariant Autoencoder
Ashay Patel, Michela Antonelli, Sebastien Ourselin, M. Jorge Cardoso
TL;DR
The paper tackles resolution variability in medical imaging by introducing a resolution-invariant autoencoder that fixes the latent resolution via learnt variable resampling, paired with a per-layer down/up factor $d_i = (l_i/r_i)^{1/n}$. It further couples a latent consistency loss and an SSIM-based information-loss mechanism to quantify and propagate uncertainty when downsampling, enabling robust performance across arbitrary input and output resolutions. The approach is validated on three tasks—arbitrary-resolution whole-body CT super-resolution, ADNI Alzheimer's classification, and multi-resolution generative modelling with a Latent Diffusion framework—demonstrating competitive results and meaningful uncertainty quantification. By eliminating the need for image pre-processing to normalize resolution and enabling cross-resolution analysis in a single latent space, the method offers practical benefits for multi-resolution medical imaging pipelines and data efficiency, with open-source code provided.
Abstract
Deep learning has significantly advanced medical imaging analysis, yet variations in image resolution remain an overlooked challenge. Most methods address this by resampling images, leading to either information loss or computational inefficiencies. While solutions exist for specific tasks, no unified approach has been proposed. We introduce a resolution-invariant autoencoder that adapts spatial resizing at each layer in the network via a learned variable resizing process, replacing fixed spatial down/upsampling at the traditional factor of 2. This ensures a consistent latent space resolution, regardless of input or output resolution. Our model enables various downstream tasks to be performed on an image latent whilst maintaining performance across different resolutions, overcoming the shortfalls of traditional methods. We demonstrate its effectiveness in uncertainty-aware super-resolution, classification, and generative modelling tasks and show how our method outperforms conventional baselines with minimal performance loss across resolutions.
