Table of Contents
Fetching ...

Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

Stefan M. Fischer, Johannes Kiechle, Daniel M. Lang, Jan C. Peeken, Julia A. Schnabel

TL;DR

The paper tackles mediastinal lymph node segmentation under weakly annotated data (LNQ2023) by evaluating multiple integration strategies and external datasets. It shows that combining loss masking, foreground coating, and especially TotalSegmentator pseudo labeling, together with Bouget refinements and NSCLC data, substantially improves segmentation performance. Ablation analyses reveal that including non-pathological LNs and broader anatomical context yields better detection of small pathological LNs, culminating in a top-3 LNQ2023 submission with a Dice of 0.628 and ASSD of 5.8 mm. The work highlights the value of structure-aware priors and semi-supervised cues for medical image segmentation and provides open-source code to promote reproducibility.

Abstract

Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological, lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes. The code and model for the challenge submission are available at \url{https://gitlab.lrz.de/compai/MediastinalLymphNodeSegmentation}.

Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge

TL;DR

The paper tackles mediastinal lymph node segmentation under weakly annotated data (LNQ2023) by evaluating multiple integration strategies and external datasets. It shows that combining loss masking, foreground coating, and especially TotalSegmentator pseudo labeling, together with Bouget refinements and NSCLC data, substantially improves segmentation performance. Ablation analyses reveal that including non-pathological LNs and broader anatomical context yields better detection of small pathological LNs, culminating in a top-3 LNQ2023 submission with a Dice of 0.628 and ASSD of 5.8 mm. The work highlights the value of structure-aware priors and semi-supervised cues for medical image segmentation and provides open-source code to promote reproducibility.

Abstract

Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological, lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes. The code and model for the challenge submission are available at \url{https://gitlab.lrz.de/compai/MediastinalLymphNodeSegmentation}.
Paper Structure (21 sections, 5 figures, 6 tables)

This paper contains 21 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Sketch of different strategies to handle weakly annotated data in our analysis. The missed lymph node instance is incorrectly set to the background class in the noisy label training. For loss masking, foreground instance coating, and TotalSegmentator Pseudo Labeling, the missing instance is removed from the training process by loss masking.
  • Figure 2: Histograms of shortest diameter of lymph node components in TCIA CT Lymph Nodes dataset, refined annotations by bouget2019semantic and the LNQ2023 training and test set.
  • Figure 3: Preprocessing steps performed on LNQ2023 training set example, shown in three orthogonal views. Top: raw input volume with weak lymph node annotation (green), Middle: volume after lung bounding box cropping, Bottom: TotalSegmentator Pseudo Labeling setting known structures to the background (blue).
  • Figure 4: Different cases of LNQ2023 test set with ground truth annotation and model predictions. For intuitive visualization the trachea is shown in blue, model prediction is in yellow and ground truth in green. For inference Model 7 was used. Left: worst case (Dice score 0.108, ASSD 19.2 mm), Center: average case (Dice score 0.626, ASSD 5.65 mm), Right: best case (Dice score 0.860, ASSD 2.38 mm)
  • Figure 5: Overlap of each ground truth lymph node component with prediction and overlap of each predicted lymph node component with all ground truth lymph node components over the shortest diameter. Lymph node components were binned regarding shortest diameter in 2.5 mm steps. Model predictions were generated by Model 5 (green) and Model 6 (blue).