ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

Marius Schmidt-Mengin; Alexis Benichoux; Shibeshih Belachew; Nikos Komodakis; Nikos Paragios

ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

Marius Schmidt-Mengin, Alexis Benichoux, Shibeshih Belachew, Nikos Komodakis, Nikos Paragios

TL;DR

ToNNO introduces a tomographic reconstruction framework that converts a 2D encoder's slice-wise predictions into a 3D heatmap for weakly supervised segmentation of 3D medical images, using the inverse Radon transform $\mathcal{R}^{-1}$. A 2D classifier is trained on randomly oriented slices with volume-level labels, and the resulting slice logits are aggregated across angles via tomography to yield dense 3D segmentations; Averaged CAM and Tomographic CAM further enhance CAM-based baselines. Across four large datasets, ToNNO outperforms GradCAM and LayerCAM, with Tomographic CAM providing the strongest improvements in many cases, while preserving compatibility with pretrained 2D networks. The approach highlights a scalable path to high-resolution 3D segmentation using 2D architectures and suggests future work on iterative reconstruction and broader pretraining strategies.

Abstract

Annotating lots of 3D medical images for training segmentation models is time-consuming. The goal of weakly supervised semantic segmentation is to train segmentation models without using any ground truth segmentation masks. Our work addresses the case where only image-level categorical labels, indicating the presence or absence of a particular region of interest (such as tumours or lesions), are available. Most existing methods rely on class activation mapping (CAM). We propose a novel approach, ToNNO, which is based on the Tomographic reconstruction of a Neural Network's Output. Our technique extracts stacks of slices with different angles from the input 3D volume, feeds these slices to a 2D encoder, and applies the inverse Radon transform in order to reconstruct a 3D heatmap of the encoder's predictions. This generic method allows to perform dense prediction tasks on 3D volumes using any 2D image encoder. We apply it to weakly supervised medical image segmentation by training the 2D encoder to output high values for slices containing the regions of interest. We test it on four large scale medical image datasets and outperform 2D CAM methods. We then extend ToNNO by combining tomographic reconstruction with CAM methods, proposing Averaged CAM and Tomographic CAM, which obtain even better results.

ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

TL;DR

. A 2D classifier is trained on randomly oriented slices with volume-level labels, and the resulting slice logits are aggregated across angles via tomography to yield dense 3D segmentations; Averaged CAM and Tomographic CAM further enhance CAM-based baselines. Across four large datasets, ToNNO outperforms GradCAM and LayerCAM, with Tomographic CAM providing the strongest improvements in many cases, while preserving compatibility with pretrained 2D networks. The approach highlights a scalable path to high-resolution 3D segmentation using 2D architectures and suggests future work on iterative reconstruction and broader pretraining strategies.

Abstract

Paper Structure (40 sections, 14 equations, 133 figures, 9 tables)

This paper contains 40 sections, 14 equations, 133 figures, 9 tables.

Introduction
Related Work
Weakly supervised semantic segmentation.
3D medical image segmentation.
Tomography.
Method
Intuition
Radon transform and its inverse
Implementation
ToNNO
Averaged CAM and Tomographic CAM
Hyperparameters
Tomographic reconstruction
Classifier choice
Experiments
...and 25 more sections

Figures (133)

Figure 1: Our method, which allows to obtain high resolution segmentations without using any ground truth segmentation masks, is completely orthogonal to class activation mapping methods such as GradCAM selvaraju2017grad or LayerCAM jiang2021layercam.
Figure 2: Visualisation of logits produced by the 2D classifier $g_\theta$
Figure 3: Overview of ToNNO. 3D volumes are represented as 2D images and 2D slices as 1D rows of pixels. First, slices are extracted from the input volume (left). By summing the pixels of each slice (top), we can approximate the Radon transform of the input volume $V$, and applying the inverse Radon transform $\mathcal{R}^{-1}$ allows to reconstruct $V$. The idea of this work is to replace the sum by a trained neural network before applying the inverse Radon transform (bottom).
Figure 4: Averaged
Figure 5: Tomographic
...and 128 more figures

ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

TL;DR

Abstract

ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

Authors

TL;DR

Abstract

Table of Contents

Figures (133)