Let Me DeCode You: Decoder Conditioning with Tabular Data
Tomasz Szczepański, Michal K. Grzeszczyk, Szymon Płotka, Arleta Adamowicz, Piotr Fudalej, Przemysław Korzeniowski, Tomasz Trzciński, Arkadiusz Sitek
TL;DR
This paper addresses the data scarcity challenge in 3D medical image segmentation by conditioning the decoder on tabular, shape-derived features learned from radiomics. The approach, DeCode, uses a 3D U-Net with decoder conditioning through FiLM-like affine transforms and a 512-feature shape embedding learned from ground-truth masks, with an embedding regression objective to enable test-time conditioning when labels are unavailable. Evaluations on a synthetic 3DeCode dataset and real dental CBCT data show improved generalization to unseen data compared to an unconditioned baseline, achieving higher Dice scores with modest training time and computational costs. The work pioneers decoder conditioning for 3D segmentation, provides open-source code and pretrained models, and highlights directions for end-to-end conditioning feature learning and broader clinical deployment.
Abstract
Training deep neural networks for 3D segmentation tasks can be challenging, often requiring efficient and effective strategies to improve model performance. In this study, we introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically, aiming to enhance the efficiency of the training process. DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features. Specifically, we develop an approach, where conditioning is applied during the training phase to guide the network toward robust segmentation. When labels are not available during inference, our model infers the necessary conditioning embedding directly from the input data, thanks to a feed-forward network learned during the training phase. This approach is tested using synthetic data and cone-beam computed tomography (CBCT) images of teeth. For CBCT, three datasets are used: one publicly available and two in-house. Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost. This work represents the first of its kind to explore conditioning strategies in 3D data segmentation, offering a novel and more efficient method for leveraging annotated data. Our code, pre-trained models are publicly available at https://github.com/SanoScience/DeCode .
