Table of Contents
Fetching ...

A CNN-Transformer for Classification of Longitudinal 3D MRI Images -- A Case Study on Hepatocellular Carcinoma Prediction

Jakob Nolte, Maureen M. J. Guichelaar, Donald E. Bouman, Stephanie M. van den Berg, Maryam Amir Haeri

TL;DR

HCCNet introduces a 3D ConvNeXt CNN backbone coupled with a Transformer encoder to predict HCC development from longitudinal 3D MRI data, explicitly handling irregular visit timings via time-aware positional encodings. A two-stage self-supervised pre-training regime—3D CNN pre-training with an extended DINO framework and a sequence-order-prediction task for the Transformer—enables effective learning from limited labeled data. Across DW-MRI, T1-DCE, and T1-IOP/T2 modalities, HCCNet achieves the strongest metrics on DW-MRI, with a Pico-scale variant reaching AUROC ≈ 0.936 and AUPRC ≈ 0.744, while pre-training generally improves robustness and calibration. The approach offers a versatile framework for longitudinal medical imaging that can be extended to multi-modal data and other chronic disease surveillance tasks, pending external validation.

Abstract

Longitudinal MRI analysis is crucial for predicting disease outcomes, particularly in chronic conditions like hepatocellular carcinoma (HCC), where early detection can significantly influence treatment strategies and patient prognosis. Yet, due to challenges like limited data availability, subtle parenchymal changes, and the irregular timing of medical screenings, current approaches have so far focused on cross-sectional imaging data. To address this, we propose HCCNet, a novel model architecture that integrates a 3D adaptation of the ConvNeXt CNN architecture with a Transformer encoder, capturing both the intricate spatial features of 3D MRIs and the complex temporal dependencies across different time points. HCCNet utilizes a two-stage pre-training process tailored for longitudinal MRI data. The CNN backbone is pre-trained using a self-supervised learning framework adapted for 3D MRIs, while the Transformer encoder is pre-trained with a sequence-order-prediction task to enhance its understanding of disease progression over time. We demonstrate the effectiveness of HCCNet by applying it to a cohort of liver cirrhosis patients undergoing regular MRI screenings for HCC surveillance. Our results show that HCCNet significantly improves predictive accuracy and reliability over baseline models, providing a robust tool for personalized HCC surveillance. The methodological approach presented in this paper is versatile and can be adapted to various longitudinal MRI screening applications. Its ability to handle varying patient record lengths and irregular screening intervals establishes it as an invaluable framework for monitoring chronic diseases, where timely and accurate disease prognosis is critical for effective treatment planning.

A CNN-Transformer for Classification of Longitudinal 3D MRI Images -- A Case Study on Hepatocellular Carcinoma Prediction

TL;DR

HCCNet introduces a 3D ConvNeXt CNN backbone coupled with a Transformer encoder to predict HCC development from longitudinal 3D MRI data, explicitly handling irregular visit timings via time-aware positional encodings. A two-stage self-supervised pre-training regime—3D CNN pre-training with an extended DINO framework and a sequence-order-prediction task for the Transformer—enables effective learning from limited labeled data. Across DW-MRI, T1-DCE, and T1-IOP/T2 modalities, HCCNet achieves the strongest metrics on DW-MRI, with a Pico-scale variant reaching AUROC ≈ 0.936 and AUPRC ≈ 0.744, while pre-training generally improves robustness and calibration. The approach offers a versatile framework for longitudinal medical imaging that can be extended to multi-modal data and other chronic disease surveillance tasks, pending external validation.

Abstract

Longitudinal MRI analysis is crucial for predicting disease outcomes, particularly in chronic conditions like hepatocellular carcinoma (HCC), where early detection can significantly influence treatment strategies and patient prognosis. Yet, due to challenges like limited data availability, subtle parenchymal changes, and the irregular timing of medical screenings, current approaches have so far focused on cross-sectional imaging data. To address this, we propose HCCNet, a novel model architecture that integrates a 3D adaptation of the ConvNeXt CNN architecture with a Transformer encoder, capturing both the intricate spatial features of 3D MRIs and the complex temporal dependencies across different time points. HCCNet utilizes a two-stage pre-training process tailored for longitudinal MRI data. The CNN backbone is pre-trained using a self-supervised learning framework adapted for 3D MRIs, while the Transformer encoder is pre-trained with a sequence-order-prediction task to enhance its understanding of disease progression over time. We demonstrate the effectiveness of HCCNet by applying it to a cohort of liver cirrhosis patients undergoing regular MRI screenings for HCC surveillance. Our results show that HCCNet significantly improves predictive accuracy and reliability over baseline models, providing a robust tool for personalized HCC surveillance. The methodological approach presented in this paper is versatile and can be adapted to various longitudinal MRI screening applications. Its ability to handle varying patient record lengths and irregular screening intervals establishes it as an invaluable framework for monitoring chronic diseases, where timely and accurate disease prognosis is critical for effective treatment planning.
Paper Structure (24 sections, 12 equations, 4 figures, 5 tables)

This paper contains 24 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Proposed Modeling Approach
  • Figure 2: Cumulative Gain Across Different Model Runs
  • Figure 3: Baseline vs. Fine-Tuned Models' Reliability Diagrams
  • Figure 4: Original and 3D ConvNeXt Block Designs