Table of Contents
Fetching ...

TTT-Unet: Enhancing U-Net with Test-Time Training Layers for Biomedical Image Segmentation

Rong Zhou, Zhengqing Yuan, Zhiling Yan, Weixiang Sun, Kai Zhang, Yiwei Li, Yanfang Ye, Xiang Li, Lifang He, Lichao Sun

TL;DR

The paper addresses the challenge of long-range dependency modeling in biomedical image segmentation by integrating Test-Time Training layers into U-Net, creating TTT-UNet. It introduces TTT layers that treat the hidden state as a trainable model updated via self-supervised learning, enabling dynamic adaptation during inference without altering the training objective. The authors demonstrate consistent, state-of-the-art improvements across 3D abdomen CT/MRI, 2D Endoscopy, and Microscopy segmentation datasets, highlighting enhanced generalization and boundary precision with manageable yet nontrivial computational cost. The work provides a practical, adaptable framework for clinical image analysis and shares a public codebase for replication and extension.

Abstract

Biomedical image segmentation is crucial for accurately diagnosing and analyzing various diseases. However, Convolutional Neural Networks (CNNs) and Transformers, the most commonly used architectures for this task, struggle to effectively capture long-range dependencies due to the inherent locality of CNNs and the computational complexity of Transformers. To address this limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time Training (TTT) layers into the traditional U-Net architecture for biomedical image segmentation. TTT-Unet dynamically adjusts model parameters during the testing time, enhancing the model's ability to capture both local and long-range features. We evaluate TTT-Unet on multiple medical imaging datasets, including 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results demonstrate that TTT-Unet consistently outperforms state-of-the-art CNN-based and Transformer-based segmentation models across all tasks. The code is available at https://github.com/rongzhou7/TTT-Unet.

TTT-Unet: Enhancing U-Net with Test-Time Training Layers for Biomedical Image Segmentation

TL;DR

The paper addresses the challenge of long-range dependency modeling in biomedical image segmentation by integrating Test-Time Training layers into U-Net, creating TTT-UNet. It introduces TTT layers that treat the hidden state as a trainable model updated via self-supervised learning, enabling dynamic adaptation during inference without altering the training objective. The authors demonstrate consistent, state-of-the-art improvements across 3D abdomen CT/MRI, 2D Endoscopy, and Microscopy segmentation datasets, highlighting enhanced generalization and boundary precision with manageable yet nontrivial computational cost. The work provides a practical, adaptable framework for clinical image analysis and shares a public codebase for replication and extension.

Abstract

Biomedical image segmentation is crucial for accurately diagnosing and analyzing various diseases. However, Convolutional Neural Networks (CNNs) and Transformers, the most commonly used architectures for this task, struggle to effectively capture long-range dependencies due to the inherent locality of CNNs and the computational complexity of Transformers. To address this limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time Training (TTT) layers into the traditional U-Net architecture for biomedical image segmentation. TTT-Unet dynamically adjusts model parameters during the testing time, enhancing the model's ability to capture both local and long-range features. We evaluate TTT-Unet on multiple medical imaging datasets, including 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results demonstrate that TTT-Unet consistently outperforms state-of-the-art CNN-based and Transformer-based segmentation models across all tasks. The code is available at https://github.com/rongzhou7/TTT-Unet.
Paper Structure (15 sections, 7 equations, 3 figures, 4 tables)

This paper contains 15 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: (a) The overall framework of TTT-UNet. (b) TTT Building Block. (c) TTT Layer.
  • Figure 2: The visualization results of TTT-UNet on Abdomen MRI datasets. The first row shows the original images, the middle row shows the ground truth, and the bottom row shows the TTT-UNet predictions.
  • Figure 3: Visualization results of TTT-UNet on Microscopy and Endoscopy datasets. The first and second rows show the original images and TTT-UNet predictions on the Microscopy dataset, respectively. The third and fourth rows show the original images and TTT-UNet predictions on the Endoscopy dataset.