Positive Semi-definite Latent Factor Grouping-Boosted Cluster-reasoning Instance Disentangled Learning for WSI Representation
Chentao Li, Behzad Bozorgtabar, Yifang Ping, Pan Huang, Jing Qin
TL;DR
This paper tackles the entanglement challenges in MIL-based whole-slide image (WSI) analysis by introducing PG-CIDL, a three-phase disentangled framework. It first uses positive semidefinite latent factor grouping (PSD-LFG) to map spatially entangled patches into a latent subspace, then applies cluster-reasoning instance disentangling (CID) via counterfactual probability inference to separate semantic factors (tumor, microenvironment, background), and finally uses instance effect re-weighting to curb decision entanglement. The approach is grounded in two decoupled structural causal models and an information-theoretic weighting scheme, enabling end-to-end optimization. Empirical results on multicenter datasets show state-of-the-art accuracy and AUC, with visualizations confirming pathologist-aligned interpretability. Overall, PG-CIDL advances interpretable, causally informed WSI representations and demonstrates strong potential for clinical deployment.
Abstract
Multiple instance learning (MIL) has been widely used for representing whole-slide pathology images. However, spatial, semantic, and decision entanglements among instances limit its representation and interpretability. To address these challenges, we propose a latent factor grouping-boosted cluster-reasoning instance disentangled learning framework for whole-slide image (WSI) interpretable representation in three phases. First, we introduce a novel positive semi-definite latent factor grouping that maps instances into a latent subspace, effectively mitigating spatial entanglement in MIL. To alleviate semantic entanglement, we employs instance probability counterfactual inference and optimization via cluster-reasoning instance disentangling. Finally, we employ a generalized linear weighted decision via instance effect re-weighting to address decision entanglement. Extensive experiments on multicentre datasets demonstrate that our model outperforms all state-of-the-art models. Moreover, it attains pathologist-aligned interpretability through disentangled representations and a transparent decision-making process.
