MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging

Yuanye Liu; Zheyao Gao; Nannan Shi; Fuping Wu; Yuxin Shi; Qingchao Chen; Xiahai Zhuang

MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging

Yuanye Liu, Zheyao Gao, Nannan Shi, Fuping Wu, Yuxin Shi, Qingchao Chen, Xiahai Zhuang

TL;DR

MERIT addresses reliable and interpretable liver fibrosis staging from multi-view MRI by modeling per-view predictions as opinions under subjective logic and fusing them with DS-based operators. It introduces a class distribution-aware base rate to handle distribution shifts and a feature-specific fusion rule to explicitly relate local and global view features. The approach yields distributional uncertainty quantification, improved calibration, and post-hoc interpretability by revealing view-specific contributions to decisions. Extensive experiments on a multi-center dataset demonstrate reliability under feature and class distribution shifts, and show clear advantages over existing uncertainty-aware and multi-view methods with potential clinical impact for more trustworthy fibrosis assessment.

Abstract

Accurate staging of liver fibrosis from magnetic resonance imaging (MRI) is crucial in clinical practice. While conventional methods often focus on a specific sub-region, multi-view learning captures more information by analyzing multiple patches simultaneously. However, previous multi-view approaches could not typically calculate uncertainty by nature, and they generally integrate features from different views in a black-box fashion, hence compromising reliability as well as interpretability of the resulting models. In this work, we propose a new multi-view method based on evidential learning, referred to as MERIT, which tackles the two challenges in a unified framework. MERIT enables uncertainty quantification of the predictions to enhance reliability, and employs a logic-based combination rule to improve interpretability. Specifically, MERIT models the prediction from each sub-view as an opinion with quantified uncertainty under the guidance of the subjective logic theory. Furthermore, a distribution-aware base rate is introduced to enhance performance, particularly in scenarios involving class distribution shifts. Finally, MERIT adopts a feature-specific combination rule to explicitly fuse multi-view predictions, thereby enhancing interpretability. Results have showcased the effectiveness of the proposed MERIT, highlighting the reliability and offering both ad-hoc and post-hoc interpretability. They also illustrate that MERIT can elucidate the significance of each view in the decision-making process for liver fibrosis staging. Our code has be released via https://github.com/HenryLau7/MERIT.

MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging

TL;DR

Abstract

Paper Structure (37 sections, 4 theorems, 26 equations, 10 figures, 7 tables)

This paper contains 37 sections, 4 theorems, 26 equations, 10 figures, 7 tables.

Introduction
Related works
Deep Learning Methods for Liver Fibrosis Staging
Multi-view Fusion
Uncertainty Quantification
Evidence Theory
Method
Problem setup and overview
Opinion representation based on subjective logic
Evidence representation of Dirichlet distribution
Mapping between Dirichlet distribution and opinion
Interpretable combination rule based on DS evidence theory
Interpretation of the combination rule
Modeling multi-view features
Feature extraction from multi-views
...and 22 more sections

Key Result

Proposition 1

If both opinions are equally confident, i.e.,$u^m=u^n< \dfrac{b^m_{\Tilde{k}}b^n_{\Tilde{k}}- b^m_j b^n_j}{\vert b^m_j+b^n_j-(b^m_{\Tilde{k}}+b^n_{\Tilde{k}})\vert}$ for $j\neq \Tilde{k}$, the combined opinion believes in the class that both opinions agree on, i.e.,$\hat{k}=\Tilde{k}$.

Figures (10)

Figure 1: (a) Distribution shift: The feature distribution shift in our problem is mainly caused by low-quality images with liver mass occupation (left) or artifacts (right). The class distribution shift denotes the difference in class proportion between training and test data. (b) Roadmap: Our MERIT framework improves the reliability and interpretability of multi-view learning through opinion representation and belief fusion operators.
Figure 2: The framework of MERIT consisted of multi-view feature extraction, opinion representation, and combination rule. In multi-view feature extraction, the whole liver MRI is converted to multiple local views and a global view image, which are encoded as evidence vectors $\{\bm{e}^v\}_{v=1}^V$ by convolutional networks (CNN) and vision transformer (ViT), respectively. In opinion representation, each evidence $\bm{e}^v$ combines with the class-distribution aware base rate $\bm{a}$ and generates a Dirichlet distribution $Dir(\bm{\mu}^v\mid\bm{\alpha}^v)$. They can be further represented by an opinion $\bm{D}^v$ composed of beliefs $b_k^v$ in each class and the uncertainty $u^v$. In combination rule, the opinions are combined via Cumulative Belief Fusion (CBF) and Belief Constraint Fusion (BCF) to derive the overall opinion $D$, which could be converted to the per-data prior distribution $Dir(\bm{\mu}\mid\bm{\alpha})$ to achieve the final prediction $\hat{y}$.
Figure 3: The predicted Dirichlet distribution under different distribution shift scenarios. (a) The predicted Dirichlet distribution can be adapted by modifying the base rate with the estimated test class proportion. (b) Out-of-distribution sample (left), which could not provide any evidence for the decision, would result in a uniform Dirichlet distribution and yield high uncertainty in our model, while the sample without feature shift (right) would produce low uncertainty.
Figure 4: (a) The pipeline to extract sub-views of the liver. First, the foreground is extracted using intensity-based segmentation. Based on the segmentation, a square ROI centered at the centroid of the liver is cropped. Then overlapped sliding windows are used in the ROI to obtain nine sub-views of the liver. (b) Locality self-attention (LSA) and shifted patch tokenization (SPT) modules applied in the data-efficient transformer. SPT replaces the original self-attention in ViT, which introduces a learnable parameter $\tau$ to scale the unnormalized attention map (i.e.,$(\bm{qk}^T)/\tau$), whose diagonal elements are then replaced with constants before softmax. LSA modifies the tokenization strategy of ViT by concatenating spatially transformed images with the original one before patch partition.
Figure 5: MRI scans of liver tissue illustrating ID and OOD data. The top row (ID) displays liver slices with an effective area greater than $90\%$ denoting scans considered as normal for the training dataset. The bottom row (OOD) depicts liver slices with an effective area less than $90\%$, representing anomalous cases used for testing the out-of-distribution dataset. Specifically, the blue area indicates liver mass, the orange area indicates local artifact and the green area indicates previous liver surgery.
...and 5 more figures

Theorems & Definitions (6)

Definition 1: Cumulative Belief Fusion Operator
Definition 2: Belief Constraint Fusion Operator
Proposition 1
Proposition 2
Proposition 1
Proposition 2

MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging

TL;DR

Abstract

MERIT: Multi-view evidential learning for reliable and interpretable liver fibrosis staging

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)