LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

Reuben Dorent; Roya Khajavi; Tagwa Idris; Erik Ziegler; Bhanusupriya Somarouthu; Heather Jacene; Ann LaCasce; Jonathan Deissler; Jan Ehrhardt; Sofija Engelson; Stefan M. Fischer; Yun Gu; Heinz Handels; Satoshi Kasai; Satoshi Kondo; Klaus Maier-Hein; Julia A. Schnabel; Guotai Wang; Litingyu Wang; Tassilo Wald; Guang-Zhong Yang; Hanxiao Zhang; Minghui Zhang; Steve Pieper; Gordon Harris; Ron Kikinis; Tina Kapur

LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

Reuben Dorent, Roya Khajavi, Tagwa Idris, Erik Ziegler, Bhanusupriya Somarouthu, Heather Jacene, Ann LaCasce, Jonathan Deissler, Jan Ehrhardt, Sofija Engelson, Stefan M. Fischer, Yun Gu, Heinz Handels, Satoshi Kasai, Satoshi Kondo, Klaus Maier-Hein, Julia A. Schnabel, Guotai Wang, Litingyu Wang, Tassilo Wald, Guang-Zhong Yang, Hanxiao Zhang, Minghui Zhang, Steve Pieper, Gordon Harris, Ron Kikinis, Tina Kapur

TL;DR

This paper presents LNQ 2023, the first international benchmark for weakly-supervised segmentation of mediastinal lymph nodes in 3D CT scans, using a large partially annotated training set and a fully annotated test set to enable fair comparison. It documents a challenge design, data description, evaluation framework, participating methods, and results, highlighting that semi-/weak-supervised approaches can reach competitive accuracy (median DSC around 61% overall, with top teams exceeding 70% by combining partial and full supervision) but that fully supervised models still outperform purely weakly-supervised ones. The study demonstrates strong ranking stability via bootstrap analysis and discusses practical implications, including limitations related to cancer-type distribution, annotation protocol biases, and the need for instance-level segmentation. Overall, LNQ provides a valuable resource for advancing weakly-supervised lymph node segmentation and emphasizes the continued importance of high-quality fully annotated data to push performance higher.

Abstract

Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of $61.0\%$. On the other hand, top-ranked teams, with a median Dice score exceeding $70\%$, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.

LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

TL;DR

Abstract

. On the other hand, top-ranked teams, with a median Dice score exceeding

, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.

Paper Structure (28 sections, 2 equations, 6 figures, 3 tables)

This paper contains 28 sections, 2 equations, 6 figures, 3 tables.

Introduction
Related Works
Challenge description
Overview
Data description
Data overview
Image acquisition
Annotation protocol
Data curation
Challenge setup
Metrics and evaluation
Choice of the metrics
Ranking scheme
Participating methods
Skeleton Suns (1st place, Deissler et al.)
...and 13 more sections

Figures (6)

Figure 1: Overview of the challenge dataset. Only partial annotations (green) were made available for the training 3D CT scans. Missing training data nodes are shown in red. Full segmentations (blue) of all nodes were performed for evaluation on the validation and testing sets, and remained private.
Figure 2: Box plot of the participants' performance for lymph node segmentation in terms of (a) DSC and (b) ASSD.
Figure 3: Relationship between the scores for each case between the first, second and third teams. Challenging cases are similar for each team.
Figure 4: Box plot of the best performance per (a) patient sex and (b) primary condition.
Figure 5: Stability of the proposed ranking scheme for 1000 bootstrap samples.
...and 1 more figures

LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

TL;DR

Abstract

LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)