Table of Contents
Fetching ...

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

TL;DR

The paper presents the RibFrac Challenge, the first large-scale, publicly available benchmark for 3D rib fracture detection and diagnosis from CT, with over 5,000 fractures in 660 scans and voxel-level masks across four fracture classes. It introduces FracNet+—an internal baseline that fuses point-based rib segmentation with voxel-based fracture segmentation, enhanced by large-scale pretrained models—achieving competitive detection performance and illustrating the value of rib segmentation. Across detection and classification tracks, the study shows that AI methods can approach or surpass human detection performance, yet classification remains clinically challenging due to diagnostic ambiguity, class imbalance, and geometric complexity. The authors provide extensive analyses, post-challenge follow-ups, and internal experiments, highlighting the role of segmentation, segmentation-to-detection coupling, and pretrained networks in advancing AI-assisted rib fracture diagnosis. The work sets a foundation for future unified, multicenter approaches that integrate labeling, centerlines, and fracture classification to move toward clinically actionable AI tools.

Abstract

Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

TL;DR

The paper presents the RibFrac Challenge, the first large-scale, publicly available benchmark for 3D rib fracture detection and diagnosis from CT, with over 5,000 fractures in 660 scans and voxel-level masks across four fracture classes. It introduces FracNet+—an internal baseline that fuses point-based rib segmentation with voxel-based fracture segmentation, enhanced by large-scale pretrained models—achieving competitive detection performance and illustrating the value of rib segmentation. Across detection and classification tracks, the study shows that AI methods can approach or surpass human detection performance, yet classification remains clinically challenging due to diagnostic ambiguity, class imbalance, and geometric complexity. The authors provide extensive analyses, post-challenge follow-ups, and internal experiments, highlighting the role of segmentation, segmentation-to-detection coupling, and pretrained networks in advancing AI-assisted rib fracture diagnosis. The work sets a foundation for future unified, multicenter approaches that integrate labeling, centerlines, and fracture classification to move toward clinically actionable AI tools.

Abstract

Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.
Paper Structure (46 sections, 2 equations, 11 figures, 6 tables)

This paper contains 46 sections, 2 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Illustration of two tracks in the RibFrac Challenge. In the rib fracture detection track, participants submit the 3D instance segmentation mask for each fracture to detect their location and extent. In the rib fracture classification track, participants submit the label for each fracture to identify their type (buckle, non-displaced, displaced, or segmental). The red regions are the segmentation masks of rib fracture instances.
  • Figure 2: Statistics and sample visualization of the RibFrac Challenge dataset. A) Histograms that display the distribution of the number of slices per scan (left), the number of rib fractures per scan (middle), and the volumetric size of each individual rib fracture (right). B) Visualization of four rib fracture samples. Each of the four rib fracture samples is represented by a cropped 3D CT patch using volumetric rendering (upper left). The axial (upper right), coronal (lower left) and sagittal (lower right) from the fracture centroid are shown in 2D, along with the human-annotated rib fracture voxel-level segmentation.
  • Figure 3: Illustration of detection hit in the FROC.Left: A detection proposal is considered a hit (or a true positive without the confidence threshold) when it overlaps with any annotation with an IoU $\geq$ 0.2 (depicted in blue). The orange and yellow instances represent false negatives, indicating annotated fractures that were not detected. The green one denotes a false positive, indicating a predicted fracture that does not exist. Right: Two illustrative samples on CT slices.
  • Figure 4: Illustration of classification confusion matrix and three F1 scores. Overall F1 score (blue) evaluates the end-to-end classification performance (from detection to classification), target-aware F1 score (orange) evaluates performance on the classification annotations (excluding FP), and prediction-aware F1 score (green) evaluates performance on the classification predictions (excluding FP and FN). FN and FP refer to false negative and false positive predictions in detection.
  • Figure 5: A schematic overview of the FracNet+ framework, illustrating the dual-branch architecture for rib fracture segmentation. The top branch, the point-based rib segmentation network (RibSeg Yang2021RibSegDAJin2022RibSegVA), processes binarized and downsampled point clouds from the whole 3D CT volume to predict ribs. The bottom branch, a voxel-based rib fracture segmentation network (a pretrained STUNet huang2023stu), operates on cropped 3D CT volumes from rib areas to predict fractures. The voxelized point features from RibSeg are fused with the voxel-based fracture predictions to enhance the final fracture detection by integrating global context and local anatomical details.
  • ...and 6 more figures