Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT
Matan Atad, Alexander W. Marka, Lisa Steinhelfer, Anna Curto-Vilalta, Yannik Leonhardt, Sarah C. Foreman, Anna-Sophia Walburga Dietrich, Robert Graf, Alexandra S. Gersing, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke, Hendrik Möller
TL;DR
The paper tackles the challenge of segmenting vertebral metastases on CT without voxel-level labels by turning vertebra-level healthy/malignant annotations into lesion masks. It introduces a diffusion-based Diffusion Autoencoder (DAE) to generate healthy edits and a Hide-and-Seek Attribution framework that tests the malignant contribution of each candidate region in isolation, using a latent-space classifier to score regions. Across held-out data, the method achieves high Dice and F1 scores for blastic and lytic lesions, outperforming representative weakly supervised baselines and offering lesion-level interpretability. The approach demonstrates that generative editing combined with selective occlusion can yield accurate, explainable segmentation in CT with minimal supervision, with potential clinical impact for lesion burden and stability assessment.
Abstract
Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.
