nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation

Fabian Isensee; Tassilo Wald; Constantin Ulrich; Michael Baumgartner; Saikat Roy; Klaus Maier-Hein; Paul F. Jaeger

nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation

Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, Paul F. Jaeger

TL;DR

This paper interrogates the recent push toward novel architectures for 3D medical image segmentation by applying a rigorous, standardized validation protocol. It conducts a large-scale benchmark across CNN-, Transformer-, and Mamba-based methods within the nnU-Net framework, using fixed hardware budgets and a diverse, carefully selected dataset suite. The key finding is that CNN-based U-Net variants, configured and scaled appropriately, continue to outperform Transformer- and Mamba-based approaches, with Auto3DSeg underperforming relative to nnU-Net. The work emphasizes a cultural shift toward rigorous validation, proposes standardized baselines and dataset suitability criteria, and provides practical guidance to reduce validation bias in future 3D segmentation research.

Abstract

The release of nnU-Net marked a paradigm shift in 3D medical image segmentation, demonstrating that a properly configured U-Net architecture could still achieve state-of-the-art results. Despite this, the pursuit of novel architectures, and the respective claims of superior performance over the U-Net baseline, continued. In this study, we demonstrate that many of these recent claims fail to hold up when scrutinized for common validation shortcomings, such as the use of inadequate baselines, insufficient datasets, and neglected computational resources. By meticulously avoiding these pitfalls, we conduct a thorough and comprehensive benchmarking of current segmentation methods including CNN-based, Transformer-based, and Mamba-based approaches. In contrast to current beliefs, we find that the recipe for state-of-the-art performance is 1) employing CNN-based U-Net models, including ResNet and ConvNeXt variants, 2) using the nnU-Net framework, and 3) scaling models to modern hardware resources. These results indicate an ongoing innovation bias towards novel architectures in the field and underscore the need for more stringent validation standards in the quest for scientific progress.

nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation

TL;DR

Abstract

nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)