Table of Contents
Fetching ...

The Liver Tumor Segmentation Benchmark (LiTS)

Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold, Suprosanna Shit, Xiaobin Hu, Jana Lipková, Markus Rempfler, Marie Piraud, Jan Kirschke, Benedikt Wiestler, Zhiheng Zhang, Christian Hülsemeyer, Marcel Beetz, Florian Ettlinger, Michela Antonelli, Woong Bae, Míriam Bellver, Lei Bi, Hao Chen, Grzegorz Chlebus, Erik B. Dam, Qi Dou, Chi-Wing Fu, Bogdan Georgescu, Xavier Giró-i-Nieto, Felix Gruen, Xu Han, Pheng-Ann Heng, Jürgen Hesser, Jan Hendrik Moltz, Christian Igel, Fabian Isensee, Paul Jäger, Fucang Jia, Krishna Chaitanya Kaluva, Mahendra Khened, Ildoo Kim, Jae-Hun Kim, Sungwoong Kim, Simon Kohl, Tomasz Konopczynski, Avinash Kori, Ganapathy Krishnamurthi, Fan Li, Hongchao Li, Junbo Li, Xiaomeng Li, John Lowengrub, Jun Ma, Klaus Maier-Hein, Kevis-Kokitsi Maninis, Hans Meine, Dorit Merhof, Akshay Pai, Mathias Perslev, Jens Petersen, Jordi Pont-Tuset, Jin Qi, Xiaojuan Qi, Oliver Rippel, Karsten Roth, Ignacio Sarasua, Andrea Schenk, Zengming Shen, Jordi Torres, Christian Wachinger, Chunliang Wang, Leon Weninger, Jianrong Wu, Daguang Xu, Xiaoping Yang, Simon Chun-Ho Yu, Yading Yuan, Miao Yu, Liping Zhang, Jorge Cardoso, Spyridon Bakas, Rickmer Braren, Volker Heinemann, Christopher Pal, An Tang, Samuel Kadoury, Luc Soler, Bram van Ginneken, Hayit Greenspan, Leo Joskowicz, Bjoern Menze

TL;DR

The LiTS paper introduces a multi-center liver and liver tumor segmentation benchmark across ISBI2017, MICCAI2017, and MICCAI2018 MSD, providing a public dataset and online evaluation to compare state-of-the-art methods. It shows that liver segmentation is generally robust while tumor segmentation remains challenging, especially for small lesions, and demonstrates steady performance gains driven by cascaded and 3D deep learning architectures. The work analyzes dataset diversity, annotation protocols, and evaluation metrics, revealing how different metrics influence rankings and highlighting remaining challenges such as lesion size and image contrast. LiTS remains an active resource, spurring methodological advances, data-sharing considerations, and future directions like uncertainty modeling and federated learning for multi-institutional collaboration.

Abstract

In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}.

The Liver Tumor Segmentation Benchmark (LiTS)

TL;DR

The LiTS paper introduces a multi-center liver and liver tumor segmentation benchmark across ISBI2017, MICCAI2017, and MICCAI2018 MSD, providing a public dataset and online evaluation to compare state-of-the-art methods. It shows that liver segmentation is generally robust while tumor segmentation remains challenging, especially for small lesions, and demonstrates steady performance gains driven by cascaded and 3D deep learning architectures. The work analyzes dataset diversity, annotation protocols, and evaluation metrics, revealing how different metrics influence rankings and highlighting remaining challenges such as lesion size and image contrast. LiTS remains an active resource, spurring methodological advances, data-sharing considerations, and future directions like uncertainty modeling and federated learning for multi-institutional collaboration.

Abstract

In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}.

Paper Structure

This paper contains 69 sections, 14 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Example from the LiTS dataset depicting a variety of shapes of on contrast-enhanced abdominal CT scans acquired. While most exams in the dataset contain only one lesion, a large group of patients with some (2-7) or many (10-12) lesions, as shown in the histogram calculated over the whole dataset.
  • Figure 2: Scatter plots of methods' performances considering: a) both segmentation and detection, b) both distance- and overlap-based metrics for three challenge events. We observe that not all the top-performing methods in three LiTS challenges achieved good scores on tumor detection. The behavior of distance- and overlap-based metrics is similar.
  • Figure 3: Inter-rater agreement between the existing annotation and new annotation sets. R1 represented the rater for the existing consensus annotation of the LiTS dataset. R2 re-annotated 15 CT scans from scratch. R3 and R4 are board-certified radiologists who checked and corrected the annotations. Specifically, one board-certified radiologist (R3) reviewed and corrected existing annotations. R4 re-evaluated R3's final annotations and corrected them. The inter-rater agreement was calculated by the Dice score per case between the pairs of two raters.
  • Figure 4: Dice and ASD scores of three top-performing teams over the three events.
  • Figure 5: Distribution of mean Dice and ASD scores of all submissions in the CodaLab platform from the year 2017 to the year 2022.
  • ...and 8 more figures