Table of Contents
Fetching ...

Knowledge-Guided Brain Tumor Segmentation via Synchronized Visual-Semantic-Topological Prior Fusion

Mingda Zhang, Kaiwen Pan

TL;DR

This work tackles brain tumor segmentation from multi-sequence MRI by addressing the limitations of pure visual learning in boundary regions. It introduces STPF, a knowledge-guided framework that synchronously fuses three priors—pathology-driven differential features, unsupervised semantic descriptions mapped to voxel space, and topological constraints from persistent homology—via a dual-level fusion mechanism and nested output heads. Empirical results on BraTS 2020 show STPF achieving a mean Dice coefficient of 0.868, with robust cross-fold stability and meaningful gains from each priors, especially in the challenging ET region. The approach demonstrates that explicit integration of anatomical semantics and geometric topology can enhance segmentation accuracy and reliability, with promising implications for clinical deployment and future multi-modal extensions.

Abstract

Background: Brain tumor segmentation requires precise delineation of hierarchical structures from multi-sequence MRI. However, existing deep learning methods primarily rely on visual features, showing insufficient discriminative power in ambiguous boundary regions. Moreover, they lack explicit integration of medical domain knowledge such as anatomical semantics and geometric topology. Methods: We propose a knowledge-guided framework, Synchronized Tri-modal Prior Fusion (STPF), that explicitly integrates three heterogeneous knowledge priors: pathology-driven differential features (T1ce-T1, T2-FLAIR, T1/T2) encoding contrast patterns; unsupervised semantic descriptions transformed into voxel-level guidance via spatialization operators; and geometric constraints extracted through persistent homology analysis. A dual-level fusion architecture dynamically allocates prior weights at the voxel level based on confidence and at the sample level through hypernetwork-generated conditional vectors. Furthermore, nested output heads structurally ensure the hierarchical constraint ET subset TC subset WT. Results: STPF achieves a mean Dice coefficient of 0.868 on the BraTS 2020 dataset, surpassing the best baseline by 2.6 percentage points (3.09% relative improvement). Notably, five-fold cross-validation yields coefficients of variation between 0.23% and 0.33%, demonstrating stable performance. Additionally, ablation experiments show that removing topological and semantic priors leads to performance degradation of 2.8% and 3.5%, respectively. Conclusions: By explicitly integrating medical knowledge priors - anatomical semantics and geometric constraints - STPF improves segmentation accuracy in ambiguous boundary regions while demonstrating generalization capability and clinical deployment potential.

Knowledge-Guided Brain Tumor Segmentation via Synchronized Visual-Semantic-Topological Prior Fusion

TL;DR

This work tackles brain tumor segmentation from multi-sequence MRI by addressing the limitations of pure visual learning in boundary regions. It introduces STPF, a knowledge-guided framework that synchronously fuses three priors—pathology-driven differential features, unsupervised semantic descriptions mapped to voxel space, and topological constraints from persistent homology—via a dual-level fusion mechanism and nested output heads. Empirical results on BraTS 2020 show STPF achieving a mean Dice coefficient of 0.868, with robust cross-fold stability and meaningful gains from each priors, especially in the challenging ET region. The approach demonstrates that explicit integration of anatomical semantics and geometric topology can enhance segmentation accuracy and reliability, with promising implications for clinical deployment and future multi-modal extensions.

Abstract

Background: Brain tumor segmentation requires precise delineation of hierarchical structures from multi-sequence MRI. However, existing deep learning methods primarily rely on visual features, showing insufficient discriminative power in ambiguous boundary regions. Moreover, they lack explicit integration of medical domain knowledge such as anatomical semantics and geometric topology. Methods: We propose a knowledge-guided framework, Synchronized Tri-modal Prior Fusion (STPF), that explicitly integrates three heterogeneous knowledge priors: pathology-driven differential features (T1ce-T1, T2-FLAIR, T1/T2) encoding contrast patterns; unsupervised semantic descriptions transformed into voxel-level guidance via spatialization operators; and geometric constraints extracted through persistent homology analysis. A dual-level fusion architecture dynamically allocates prior weights at the voxel level based on confidence and at the sample level through hypernetwork-generated conditional vectors. Furthermore, nested output heads structurally ensure the hierarchical constraint ET subset TC subset WT. Results: STPF achieves a mean Dice coefficient of 0.868 on the BraTS 2020 dataset, surpassing the best baseline by 2.6 percentage points (3.09% relative improvement). Notably, five-fold cross-validation yields coefficients of variation between 0.23% and 0.33%, demonstrating stable performance. Additionally, ablation experiments show that removing topological and semantic priors leads to performance degradation of 2.8% and 3.5%, respectively. Conclusions: By explicitly integrating medical knowledge priors - anatomical semantics and geometric constraints - STPF improves segmentation accuracy in ambiguous boundary regions while demonstrating generalization capability and clinical deployment potential.

Paper Structure

This paper contains 29 sections, 15 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: STPF framework diagram. The left encoder extracts visual features from multi-sequence MRI, obtaining z_vis at the bottom. The purple topological path generates topological tokens $T$ via GAT and obtains z_topo through spatialization. The pink semantic path includes candidate region generation and attribute extraction, obtaining semantic tokens $S$ through Encoder and z_sem through spatialization. Three-way knowledge priors are fused at the bottom through Gate and sent to global level–sample-level modulation. The yellow-green section performs Attention on $T$, $S$, $V$, then completes voxel-level fusion through spatialization and voxel-level adaptive fusion (estimating weights via $\phi_v,\ \phi_s,\ \phi_t$), progressively restoring on the right side and outputting segmentation results.
  • Figure 2: Dice coefficient percentile curves for three tumor sub-regions (WT, TC, ET). Steeper curves indicate performance concentration in most cases, flatter curves indicate broader performance distribution. The steepest WT curve indicates stable edema segmentation, while the flattest ET curve reflects increased difficulty of small lesion detection.
  • Figure 3: Five-fold cross-validation Dice coefficient heatmap, comparing complete STPF model (Exp-1) with MRI-only visual baseline (Exp-7). Darker colors indicate greater performance differences. Complete model outperforms baseline across all folds and sub-regions, showing higher cross-fold stability.
  • Figure 4: Ablation experiment Dice coefficient change boxplots (n=369). Each configuration shows performance degradation relative to complete model ($\Delta$Dice) and its 95% confidence interval. Removal of semantic prior (w/o SemPrior) and pure visual baseline (Only MRI) cause substantial performance loss, thereby proving the important role of knowledge-guided multi-modal prior fusion.
  • Figure 5: Robustness analysis dumbbell plot under different semantic perturbation scenarios. Each line segment connects the complete model with corresponding perturbation scenario's mean Dice coefficient, with segment length indicating performance degradation magnitude. Model shows good tolerance to attribute loss and noise injection, maintaining stable performance even under 70% attribute loss or completely random semantics.
  • ...and 1 more figures