Table of Contents
Fetching ...

Let Me DeCode You: Decoder Conditioning with Tabular Data

Tomasz Szczepański, Michal K. Grzeszczyk, Szymon Płotka, Arleta Adamowicz, Piotr Fudalej, Przemysław Korzeniowski, Tomasz Trzciński, Arkadiusz Sitek

TL;DR

This paper addresses the data scarcity challenge in 3D medical image segmentation by conditioning the decoder on tabular, shape-derived features learned from radiomics. The approach, DeCode, uses a 3D U-Net with decoder conditioning through FiLM-like affine transforms and a 512-feature shape embedding learned from ground-truth masks, with an embedding regression objective to enable test-time conditioning when labels are unavailable. Evaluations on a synthetic 3DeCode dataset and real dental CBCT data show improved generalization to unseen data compared to an unconditioned baseline, achieving higher Dice scores with modest training time and computational costs. The work pioneers decoder conditioning for 3D segmentation, provides open-source code and pretrained models, and highlights directions for end-to-end conditioning feature learning and broader clinical deployment.

Abstract

Training deep neural networks for 3D segmentation tasks can be challenging, often requiring efficient and effective strategies to improve model performance. In this study, we introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically, aiming to enhance the efficiency of the training process. DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features. Specifically, we develop an approach, where conditioning is applied during the training phase to guide the network toward robust segmentation. When labels are not available during inference, our model infers the necessary conditioning embedding directly from the input data, thanks to a feed-forward network learned during the training phase. This approach is tested using synthetic data and cone-beam computed tomography (CBCT) images of teeth. For CBCT, three datasets are used: one publicly available and two in-house. Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost. This work represents the first of its kind to explore conditioning strategies in 3D data segmentation, offering a novel and more efficient method for leveraging annotated data. Our code, pre-trained models are publicly available at https://github.com/SanoScience/DeCode .

Let Me DeCode You: Decoder Conditioning with Tabular Data

TL;DR

This paper addresses the data scarcity challenge in 3D medical image segmentation by conditioning the decoder on tabular, shape-derived features learned from radiomics. The approach, DeCode, uses a 3D U-Net with decoder conditioning through FiLM-like affine transforms and a 512-feature shape embedding learned from ground-truth masks, with an embedding regression objective to enable test-time conditioning when labels are unavailable. Evaluations on a synthetic 3DeCode dataset and real dental CBCT data show improved generalization to unseen data compared to an unconditioned baseline, achieving higher Dice scores with modest training time and computational costs. The work pioneers decoder conditioning for 3D segmentation, provides open-source code and pretrained models, and highlights directions for end-to-end conditioning feature learning and broader clinical deployment.

Abstract

Training deep neural networks for 3D segmentation tasks can be challenging, often requiring efficient and effective strategies to improve model performance. In this study, we introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically, aiming to enhance the efficiency of the training process. DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features. Specifically, we develop an approach, where conditioning is applied during the training phase to guide the network toward robust segmentation. When labels are not available during inference, our model infers the necessary conditioning embedding directly from the input data, thanks to a feed-forward network learned during the training phase. This approach is tested using synthetic data and cone-beam computed tomography (CBCT) images of teeth. For CBCT, three datasets are used: one publicly available and two in-house. Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost. This work represents the first of its kind to explore conditioning strategies in 3D data segmentation, offering a novel and more efficient method for leveraging annotated data. Our code, pre-trained models are publicly available at https://github.com/SanoScience/DeCode .
Paper Structure (6 sections, 1 equation, 4 figures, 3 tables)

This paper contains 6 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An overview of the proposed DeCode method for conditioning segmentation decoder with learned shape features embedding. During the inference when test labels are unavailable, we use the learned feature embedding optimized with $L_1$ loss in Eq. \ref{['eq:loss']}. We perform conditioning after the skip connection from the encoder, allowing for a dynamic and selective decoding process. We also leverage a features regression as a helper task that boosts meaningful feature extraction. Skip connections and the flow of the shape features embedding are indicated with blue and purple arrows respectively. $Ei$ and $Di$ correspond to the encoder and decoder stages.
  • Figure 2: Normalized mean shape features calculated with PyRadiomics van2017computational on CBCT Tooth dataset cui2022fully. Each shape feature is calculated for every tooth separately revealing morphological differences between tooth types.
  • Figure 3: The 3DeCode data samples. The first column presents a 3D image, the basis for various configurations corresponding to the conditioning task, given along rows. Exemplary labels can be found in the central column. In the last column, we present one of the cross-sections. The dataset can be generated using the provided source code and attached configuration files with seeds.
  • Figure 4: Normalized mean shape features calculated with PyRadiomics on the proprietary test datasets. Each shape feature is calculated for every tooth separately revealing morphological differences between tooth types. A small difference in mean values between the datasets shape features can be found.