Table of Contents
Fetching ...

OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation

Tian Lan, Lei Xu, Zimu Yuan, Shanggui Liu, Jiajun Liu, Jiaxin Liu, Weilai Xiang, Hongyu Yang, Dong Jiang, Jianxin Yin, Dingyu Wang

TL;DR

It is suggested that diffusion-based foundation models can serve as a unified platform for multi-disease diagnosis and anatomical segmentation, potentially improving the efficiency and accuracy of musculoskeletal MRI interpretation in real-world clinical workflows.

Abstract

Musculoskeletal disorders represent a significant global health burden and are a leading cause of disability worldwide. While MRI is essential for accurate diagnosis, its interpretation remains exceptionally challenging. Radiologists must identify multiple potential abnormalities within complex anatomical structures across different imaging planes, a process that requires significant expertise and is prone to variability. We developed OrthoDiffusion, a unified diffusion-based foundation model designed for multi-task musculoskeletal MRI interpretation. The framework utilizes three orientation-specific 3D diffusion models, pre-trained in a self-supervised manner on 15,948 unlabeled knee MRI scans, to learn robust anatomical features from sagittal, coronal, and axial views. These view-specific representations are integrated to support diverse clinical tasks, including anatomical segmentation and multi-label diagnosis. Our evaluation demonstrates that OrthoDiffusion achieves excellent performance in the segmentation of 11 knee structures and the detection of 8 knee abnormalities. The model exhibited remarkable robustness across different clinical centers and MRI field strengths, consistently outperforming traditional supervised models. Notably, in settings where labeled data was scarce, OrthoDiffusion maintained high diagnostic precision using only 10\% of training labels. Furthermore, the anatomical representations learned from knee imaging proved highly transferable to other joints, achieving strong diagnostic performance across 11 diseases of the ankle and shoulder. These findings suggest that diffusion-based foundation models can serve as a unified platform for multi-disease diagnosis and anatomical segmentation, potentially improving the efficiency and accuracy of musculoskeletal MRI interpretation in real-world clinical workflows.

OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation

TL;DR

It is suggested that diffusion-based foundation models can serve as a unified platform for multi-disease diagnosis and anatomical segmentation, potentially improving the efficiency and accuracy of musculoskeletal MRI interpretation in real-world clinical workflows.

Abstract

Musculoskeletal disorders represent a significant global health burden and are a leading cause of disability worldwide. While MRI is essential for accurate diagnosis, its interpretation remains exceptionally challenging. Radiologists must identify multiple potential abnormalities within complex anatomical structures across different imaging planes, a process that requires significant expertise and is prone to variability. We developed OrthoDiffusion, a unified diffusion-based foundation model designed for multi-task musculoskeletal MRI interpretation. The framework utilizes three orientation-specific 3D diffusion models, pre-trained in a self-supervised manner on 15,948 unlabeled knee MRI scans, to learn robust anatomical features from sagittal, coronal, and axial views. These view-specific representations are integrated to support diverse clinical tasks, including anatomical segmentation and multi-label diagnosis. Our evaluation demonstrates that OrthoDiffusion achieves excellent performance in the segmentation of 11 knee structures and the detection of 8 knee abnormalities. The model exhibited remarkable robustness across different clinical centers and MRI field strengths, consistently outperforming traditional supervised models. Notably, in settings where labeled data was scarce, OrthoDiffusion maintained high diagnostic precision using only 10\% of training labels. Furthermore, the anatomical representations learned from knee imaging proved highly transferable to other joints, achieving strong diagnostic performance across 11 diseases of the ankle and shoulder. These findings suggest that diffusion-based foundation models can serve as a unified platform for multi-disease diagnosis and anatomical segmentation, potentially improving the efficiency and accuracy of musculoskeletal MRI interpretation in real-world clinical workflows.
Paper Structure (24 sections, 18 equations, 5 figures, 21 tables)

This paper contains 24 sections, 18 equations, 5 figures, 21 tables.

Figures (5)

  • Figure 1: Overview of the OrthoDiffusion framework.a, Multi-planar musculoskeletal MRI acquisition, including sagittal, coronal, and axial views, with anisotropic resolution across slices. Dataset construction and task composition across joints are illustrated. b, Unconditional 3D diffusion pretraining using a 3D U-Net noise predictor on large-scale knee MRI data. c, Feature extraction from intermediate diffusion representations at selected timesteps and bottleneck blocks, followed by pooling and multi-label classification. Feature-level and label-level fusion strategies are applied to integrate sagittal, coronal, and axial representations. d, Anatomical segmentation pipeline using diffusion representations coupled with a lightweight segmentation head. e, Multimodal fusion strategy integrating MRI diffusion representations with structured electronic health record (EHR) data for diagnosis. f, Diagnostic performance (AUROC) across 19 musculoskeletal abnormalities involving the knee, ankle, and shoulder joints, compared with baseline CNN models.
  • Figure 1: Macro-averaged AUROC of the eight-label knee injury prediction task under linear probing on the Center A+B+C test set, based on SAP-processed features extracted from 3D-UNet blocks at different diffusion timesteps across MRI planes.
  • Figure 2: Qualitative and quantitative evaluation of OrthoDiffusion for knee anatomical segmentation.a, Label efficiency of OrthoDiffusion for knee anatomical segmentation on the Sagittal plane, illustrated by representative Dice Similarity Coefficient curves for selected anatomical structures under progressively reduced supervision. b, Qualitative multi-planar segmentation results. Representative sagittal and coronal knee MRI slices showing the original input images, corresponding model-predicted anatomical segmentations (e.g., femur, tibia, cartilage, etc.), and manual ground-truth annotations, illustrating accurate delineation of key joint structures across imaging planes.
  • Figure 3: Label efficiency, cross-anatomy transferability, and robustness of OrthoDiffusion.a, Diagnostic performance (AUROC) of OrthoDiffusion on eight knee abnormalities as a function of the proportion of labeled training data, demonstrating label-efficient learning under limited supervision. b, Cross-anatomy generalization performance of OrthoDiffusion on ankle and shoulder abnormality diagnosis, transferred from a diffusion backbone pretrained exclusively on knee MRI. c, Robustness of OrthoDiffusion to variations in MRI magnetic field strength (1.5T and 3.0 T), evaluated on knee MRI acquired at different field strengths.
  • Figure 4: Orientation-specific fusion expert-weights predicted by MPAE reflect clinical imaging practice. Sankey diagrams showing label-specific fusion weights assigned by the MPAE module across axial, sagittal, and coronal planes for representative (a) knee, (b) shoulder, and (c) ankle abnormalities in a single patient. The relative thickness of each flow indicates the contribution of each imaging orientation to the final prediction, revealing plane preferences that align with established clinical reading protocols for different orientations and pathologies. Representative MRI slices from each orientation are shown for reference.