Table of Contents
Fetching ...

Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT

Matan Atad, Alexander W. Marka, Lisa Steinhelfer, Anna Curto-Vilalta, Yannik Leonhardt, Sarah C. Foreman, Anna-Sophia Walburga Dietrich, Robert Graf, Alexandra S. Gersing, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke, Hendrik Möller

TL;DR

The paper tackles the challenge of segmenting vertebral metastases on CT without voxel-level labels by turning vertebra-level healthy/malignant annotations into lesion masks. It introduces a diffusion-based Diffusion Autoencoder (DAE) to generate healthy edits and a Hide-and-Seek Attribution framework that tests the malignant contribution of each candidate region in isolation, using a latent-space classifier to score regions. Across held-out data, the method achieves high Dice and F1 scores for blastic and lytic lesions, outperforming representative weakly supervised baselines and offering lesion-level interpretability. The approach demonstrates that generative editing combined with selective occlusion can yield accurate, explainable segmentation in CT with minimal supervision, with potential clinical impact for lesion burden and stability assessment.

Abstract

Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.

Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT

TL;DR

The paper tackles the challenge of segmenting vertebral metastases on CT without voxel-level labels by turning vertebra-level healthy/malignant annotations into lesion masks. It introduces a diffusion-based Diffusion Autoencoder (DAE) to generate healthy edits and a Hide-and-Seek Attribution framework that tests the malignant contribution of each candidate region in isolation, using a latent-space classifier to score regions. Across held-out data, the method achieves high Dice and F1 scores for blastic and lytic lesions, outperforming representative weakly supervised baselines and offering lesion-level interpretability. The approach demonstrates that generative editing combined with selective occlusion can yield accurate, explainable segmentation in CT with minimal supervision, with potential clinical impact for lesion burden and stability assessment.

Abstract

Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.

Paper Structure

This paper contains 42 sections, 3 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Weakly supervised vertebral lesion segmentation from image-level labels. (1) Classifier-guided healthy edit (yellow) is generated in DAE latent space from the original (green). Difference maps produce suspect lesions. (2) Hide-and-Seek isolates each candidate (green contours), occludes the others (yellow), and computes normalized $\Delta$-scores. Thresholding these scores yields the final masks.
  • Figure 2: Qualitative comparison on four vertebral CT slices (rows). Columns show the input image, ground truth, evaluated baselines and our method. Blastic lesions are shown in blue, lytic in red. CAM-based results are thresholded heatmaps. The examples: (1) a diffuse blastic lesion with a bright focus, (2) a large lytic lesion with cortical breakthrough, (3) a blastic lesion in a grainy scan with imaging artifacts, and (4) a mixed case with a small lytic focus. Arrows indicate features referenced in the text.
  • Figure 3: Distribution of healthy and malignant vertebrae across thoracic and lumbar levels.
  • Figure 4: Log-scale histogram of manual lesion sizes (in pixels) for lytic (red) and blastic (blue) lesions. The dashed vertical line marks the 5-pixel threshold used as the minimum predicted lesion size in evaluation.
  • Figure 5: Qualitative comparison on eight vertebral CT slices (rows). Columns show the input image, manual ground truth, Otsu, CAM-based baselines, MedSAM, anomaly detection (AD), and our method.
  • ...and 4 more figures