Table of Contents
Fetching ...

A baseline for machine-learning-based hepatocellular carcinoma diagnosis using multi-modal clinical data

Binwu Wang, Isaac Rodriguez, Leon Breitinger, Fabian Tollens, Timo Itzel, Dennis Grimm, Andrei Sirazitdinov, Matthias Frölich, Stefan Schönberg, Andreas Teufel, Jürgen Hesser, Wenzhao Zhao

TL;DR

The TNM staging task is a good example case where multi-model classification is mandatory to achieve accurate results and the classifier shows that this high level of prediction accuracy can only be obtained by combining image and clinical laboratory data.

Abstract

The objective of this paper is to provide a baseline for performing multi-modal data classification on a novel open multimodal dataset of hepatocellular carcinoma (HCC), which includes both image data (contrast-enhanced CT and MRI images) and tabular data (the clinical laboratory test data as well as case report forms). TNM staging is the classification task. Features from the vectorized preprocessed tabular data and radiomics features from contrast-enhanced CT and MRI images are collected. Feature selection is performed based on mutual information. An XGBoost classifier predicts the TNM staging and it shows a prediction accuracy of $0.89 \pm 0.05$ and an AUC of $0.93 \pm 0.03$. The classifier shows that this high level of prediction accuracy can only be obtained by combining image and clinical laboratory data and therefore is a good example case where multi-model classification is mandatory to achieve accurate results.

A baseline for machine-learning-based hepatocellular carcinoma diagnosis using multi-modal clinical data

TL;DR

The TNM staging task is a good example case where multi-model classification is mandatory to achieve accurate results and the classifier shows that this high level of prediction accuracy can only be obtained by combining image and clinical laboratory data.

Abstract

The objective of this paper is to provide a baseline for performing multi-modal data classification on a novel open multimodal dataset of hepatocellular carcinoma (HCC), which includes both image data (contrast-enhanced CT and MRI images) and tabular data (the clinical laboratory test data as well as case report forms). TNM staging is the classification task. Features from the vectorized preprocessed tabular data and radiomics features from contrast-enhanced CT and MRI images are collected. Feature selection is performed based on mutual information. An XGBoost classifier predicts the TNM staging and it shows a prediction accuracy of and an AUC of . The classifier shows that this high level of prediction accuracy can only be obtained by combining image and clinical laboratory data and therefore is a good example case where multi-model classification is mandatory to achieve accurate results.
Paper Structure (18 sections, 3 figures, 3 tables)

This paper contains 18 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The importance of selected tabular features for TNM staging.
  • Figure 2: The importance of selected radiomics features for TNM staging.
  • Figure 3: The probability distributions and the ROC curves (One vs Rest, OvR). In the first row are results of the probability distributions. The second row shows the corresponding ROC curves.