Table of Contents
Fetching ...

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi

TL;DR

UKBOB introduces the largest MRI segmentation dataset to date by leveraging UK Biobank data (51,761 full-body MRIs and over 1.37 billion 2D masks across 72 organs), enabled by automatic labeling with TotalVibe Segmentator and quality control via Specialized Organ Label Filter (SOLF). To contend with residual label noise, the authors propose Entropy Test-Time Adaptation (ETTA) and validate labels with UKBOB-manual, achieving strong zero-shot generalization to related abdominal datasets. They train Swin-BOB, a Swin-UNetr-based foundation model, which delivers state-of-the-art results on BRATS and BTCV benchmarks and demonstrates zero-shot transfer to AMOS and BTCV. The work provides code and plans to release filtered labels, significantly advancing scalable, robust 3D medical image segmentation research.

Abstract

In medical imaging, the primary challenge is collecting large-scale labeled data due to privacy concerns, logistics, and high labeling costs. In this work, we present the UK Biobank Organs and Bones (UKBOB), the largest labeled dataset of body organs, comprising 51,761 MRI 3D samples (equivalent to 17.9 million 2D images) and more than 1.37 billion 2D segmentation masks of 72 organs, all based on the UK Biobank MRI dataset. We utilize automatic labeling, introduce an automated label cleaning pipeline with organ-specific filters, and manually annotate a subset of 300 MRIs with 11 abdominal classes to validate the quality (referred to as UKBOB-manual). This approach allows for scaling up the dataset collection while maintaining confidence in the labels. We further confirm the validity of the labels by demonstrating zero-shot generalization of trained models on the filtered UKBOB to other small labeled datasets from similar domains (e.g., abdominal MRI). To further mitigate the effect of noisy labels, we propose a novel method called Entropy Test-time Adaptation (ETTA) to refine the segmentation output. We use UKBOB to train a foundation model, Swin-BOB, for 3D medical image segmentation based on the Swin-UNetr architecture, achieving state-of-the-art results in several benchmarks in 3D medical imaging, including the BRATS brain MRI tumor challenge (with a 0.4% improvement) and the BTCV abdominal CT scan benchmark (with a 1.3% improvement). The pre-trained models and the code are available at https://emmanuelleb985.github.io/ukbob , and the filtered labels will be made available with the UK Biobank.

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

TL;DR

UKBOB introduces the largest MRI segmentation dataset to date by leveraging UK Biobank data (51,761 full-body MRIs and over 1.37 billion 2D masks across 72 organs), enabled by automatic labeling with TotalVibe Segmentator and quality control via Specialized Organ Label Filter (SOLF). To contend with residual label noise, the authors propose Entropy Test-Time Adaptation (ETTA) and validate labels with UKBOB-manual, achieving strong zero-shot generalization to related abdominal datasets. They train Swin-BOB, a Swin-UNetr-based foundation model, which delivers state-of-the-art results on BRATS and BTCV benchmarks and demonstrates zero-shot transfer to AMOS and BTCV. The work provides code and plans to release filtered labels, significantly advancing scalable, robust 3D medical image segmentation research.

Abstract

In medical imaging, the primary challenge is collecting large-scale labeled data due to privacy concerns, logistics, and high labeling costs. In this work, we present the UK Biobank Organs and Bones (UKBOB), the largest labeled dataset of body organs, comprising 51,761 MRI 3D samples (equivalent to 17.9 million 2D images) and more than 1.37 billion 2D segmentation masks of 72 organs, all based on the UK Biobank MRI dataset. We utilize automatic labeling, introduce an automated label cleaning pipeline with organ-specific filters, and manually annotate a subset of 300 MRIs with 11 abdominal classes to validate the quality (referred to as UKBOB-manual). This approach allows for scaling up the dataset collection while maintaining confidence in the labels. We further confirm the validity of the labels by demonstrating zero-shot generalization of trained models on the filtered UKBOB to other small labeled datasets from similar domains (e.g., abdominal MRI). To further mitigate the effect of noisy labels, we propose a novel method called Entropy Test-time Adaptation (ETTA) to refine the segmentation output. We use UKBOB to train a foundation model, Swin-BOB, for 3D medical image segmentation based on the Swin-UNetr architecture, achieving state-of-the-art results in several benchmarks in 3D medical imaging, including the BRATS brain MRI tumor challenge (with a 0.4% improvement) and the BTCV abdominal CT scan benchmark (with a 1.3% improvement). The pre-trained models and the code are available at https://emmanuelleb985.github.io/ukbob , and the filtered labels will be made available with the UK Biobank.

Paper Structure

This paper contains 28 sections, 4 equations, 16 figures, 13 tables, 1 algorithm.

Figures (16)

  • Figure 1: UKBOB Size and Diversity. Our proposed UK Biobank Organs and Bones (UKBOB) is the largest labeled medical imaging dataset for segmentation, comprising body organs of 51,761 MRI 3D samples (17.9 M 2D images) and a total of more than 1.37 billion 2D masks of 72 organs. Left: we show label examples from UKBOB from axial, coronal, and sagittal views. Right: We show a plot of the size (number of 2D images) and diversity (number of classes) of our UKBOB compared to other medical images datasets. The size of the bubbles indicates 2D image resolution. This new scale in dataset size and diversity should unlock a new wave of applications and methods in the computer vision and medical imaging communities.
  • Figure 2: Accuracy of UKBOB Labels.An example of segmentation labels in UKBOB is shown in the sagittal view. The labels include "spine" (in purple) which we can compare to previously collected hand labels of the spine Bourigault23 (in red). We note that the newly collected labels match the manual labels in the spine with a total Dice score of 81.1% on a set of 250 manually annotated test samples, indicating accurate labels.
  • Figure 3: UKBOB-Manual. We collect manual labels for 300 samples of UKBOB for 11 abdominal organs totaling 3,000 images. UKBOB-manual acts as manual validation for the large UKBOB. Examples of axial slices are shown here.
  • Figure 4: Specialized Organ Label Filter (SOLF). SOLF integrates sphericity, eccentricity, and normalized volume to statistically filter out inaccurate organ labels. From left to right, the panels display examples of low sphericity (0.21), high sphericity (0.95), low eccentricity (0.14), and high eccentricity (0.87).
  • Figure 5: Entropy Test-Time Adaptation for Image Segmentation. We use a test-time entropy map to refine the batch norm layer of the network for robust segmentation output. This module is agnostic to the architecture of the deep neural network. Therefore, It can be used with any segmentation network to increase consistency and robustness, especially when trained with noisy labels.
  • ...and 11 more figures