Table of Contents
Fetching ...

Learning from spatially inhomogenous data: resolution-adaptive convolutions for multiple sclerosis lesion segmentation

Ivan Diaz, Florin Scherer, Yanik Berli, Roland Wiest, Helly Hammer, Robert Hoepner, Alejandro Leon Betancourt, Piotr Radojewski, Richard McKinley

TL;DR

The paper tackles the problem of tracking MS lesions in MRI when image resolution varies across vendors and protocols, a setting in which resampling to a common isovoxel grid can degrade fidelity. It introduces resolution-adaptive convolutions based on spherical-harmonic kernels within the e3nn framework, enabling kernels defined in physical space $W(\vec{r}) = R(|\vec{r}|) Y_l^m(\hat{\vec{r}})$ to operate across arbitrary resolutions without resampling. Empirical results show the resolution-adaptive network generalizes to unseen resolutions and often outperforms standard U-Nets, especially when trained on mixed-resolution data, though resampling can still be advantageous in certain cross-resolution scenarios. The proposed method offers a robust alternative for clinical MRI workflows with heterogeneous data, reducing interpolation artifacts while maintaining competitive segmentation performance; code is released at the referenced repository.

Abstract

In the setting of clinical imaging, differences in between vendors, hospitals and sequences can yield highly inhomogeneous imaging data. In MRI in particular, voxel dimension, slice spacing and acquisition plane can vary substantially. For clinical applications, therefore, algorithms must be trained to handle data with various voxel resolutions. The usual strategy to deal with heterogeneity of resolution is harmonization: resampling imaging data to a common (usually isovoxel) resolution. This can lead to loss of fidelity arising from interpolation artifacts out-of-plane and downsampling in-plane. We present in this paper a network architecture designed to be able to learn directly from spatially heterogeneous data, without resampling: a segmentation network based on the e3nn framework that leverages a spherical harmonic, rather than voxel-grid, parameterization of convolutional kernels, with a fixed physical radius. Networks based on these kernels can be resampled to their input voxel dimensions. We trained and tested our network on a publicly available dataset assembled from three centres, and on an in-house dataset of Multiple Sclerosis cases with a high degree of spatial inhomogeneity. We compared our approach to a standard U-Net with two strategies for handling inhomogeneous data: training directly on the data without resampling, and resampling to a common resolution of 1mm isovoxels. We show that our network is able to learn from various combinations of voxel sizes and outperforms classical U-Nets on 2D testing cases and most 3D testing cases. This shows an ability to generalize well when tested on image resolutions not seen during training. Our code can be found at: http://github.com/SCAN-NRAD/e3nn\_U-Net.

Learning from spatially inhomogenous data: resolution-adaptive convolutions for multiple sclerosis lesion segmentation

TL;DR

The paper tackles the problem of tracking MS lesions in MRI when image resolution varies across vendors and protocols, a setting in which resampling to a common isovoxel grid can degrade fidelity. It introduces resolution-adaptive convolutions based on spherical-harmonic kernels within the e3nn framework, enabling kernels defined in physical space to operate across arbitrary resolutions without resampling. Empirical results show the resolution-adaptive network generalizes to unseen resolutions and often outperforms standard U-Nets, especially when trained on mixed-resolution data, though resampling can still be advantageous in certain cross-resolution scenarios. The proposed method offers a robust alternative for clinical MRI workflows with heterogeneous data, reducing interpolation artifacts while maintaining competitive segmentation performance; code is released at the referenced repository.

Abstract

In the setting of clinical imaging, differences in between vendors, hospitals and sequences can yield highly inhomogeneous imaging data. In MRI in particular, voxel dimension, slice spacing and acquisition plane can vary substantially. For clinical applications, therefore, algorithms must be trained to handle data with various voxel resolutions. The usual strategy to deal with heterogeneity of resolution is harmonization: resampling imaging data to a common (usually isovoxel) resolution. This can lead to loss of fidelity arising from interpolation artifacts out-of-plane and downsampling in-plane. We present in this paper a network architecture designed to be able to learn directly from spatially heterogeneous data, without resampling: a segmentation network based on the e3nn framework that leverages a spherical harmonic, rather than voxel-grid, parameterization of convolutional kernels, with a fixed physical radius. Networks based on these kernels can be resampled to their input voxel dimensions. We trained and tested our network on a publicly available dataset assembled from three centres, and on an in-house dataset of Multiple Sclerosis cases with a high degree of spatial inhomogeneity. We compared our approach to a standard U-Net with two strategies for handling inhomogeneous data: training directly on the data without resampling, and resampling to a common resolution of 1mm isovoxels. We show that our network is able to learn from various combinations of voxel sizes and outperforms classical U-Nets on 2D testing cases and most 3D testing cases. This shows an ability to generalize well when tested on image resolutions not seen during training. Our code can be found at: http://github.com/SCAN-NRAD/e3nn\_U-Net.

Paper Structure

This paper contains 19 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Distribution of 2D and 3D acquisitions from our in-house data set. The size of the markers is proportional to the number of cases available.
  • Figure 2: Comparison of segmentations of a 2D FLAIR image (coronal acquisition, 0.35mm in-plane resolution, slice spacing 4.8mm), using models trained on both 3D and 2D data. Blue pixels denote a true positive, green are false negatives and red false positives. (a) U-Net native, (b) U-Net $@1\mathrm{mm}$ (c) Resolution-adaptive network. The network operating at native resolution fails to segment large, areas at the center of the lesion, while the network trained and applied to resampled data performs better but cannot take advantage of the very high in-plane resolution, which allows the Resolution-adaptive network to better follow the lesion contours.