Table of Contents
Fetching ...

Shape-Adapting Gated Experts: Dynamic Expert Routing for Colonoscopic Lesion Segmentation

Gia Huy Thai, Hoang-Nguyen Vu, Anh-Minh Phan, Quang-Thinh Ly, Tram Dinh, Thi-Ngoc-Truc Nguyen, Nhat Ho

TL;DR

Shape-Adapting Gated Experts (SAGE) addresses heterogeneity in histopathology segmentation by turning static CNN-Transformer backbones into dynamic dual-path networks. It introduces a hierarchical Top-K expert routing and the Shape-Adapting Hub to bridge CNN and Transformer representations, enabling input-dependent computation and adaptive local-global refinement. Evaluations on EBHI, DigestPath, and GlaS show state-of-the-art Dice scores and strong cross-domain generalization, validating the effectiveness of dynamic routing and cross-architecture fusion. These contributions offer a scalable framework for adaptive visual reasoning in medical imaging and beyond.

Abstract

The substantial diversity in cell scale and form remains a primary challenge in computer-aided cancer detection on gigapixel Whole Slide Images (WSIs), attributable to cellular heterogeneity. Existing CNN-Transformer hybrids rely on static computation graphs with fixed routing, which consequently causes redundant computation and limits their adaptability to input variability. We propose Shape-Adapting Gated Experts (SAGE), an input-adaptive framework that enables dynamic expert routing in heterogeneous visual networks. SAGE reconfigures static backbones into dynamically routed expert architectures. SAGE's dual-path design features a backbone stream that preserves representation and selectively activates an expert path through hierarchical gating. This gating mechanism operates at multiple hierarchical levels, performing a two-level, hierarchical selection between shared and specialized experts to modulate model logits for Top-K activation. Our Shape-Adapting Hub (SA-Hub) harmonizes structural and semantic representations across the CNN and the Transformer module, effectively bridging diverse modules. Embodied as SAGE-UNet, our model achieves superior segmentation on three medical benchmarks: EBHI, DigestPath, and GlaS, yielding state-of-the-art Dice Scores of 95.57%, 95.16%, and 94.17%, respectively, and robustly generalizes across domains by adaptively balancing local refinement and global context. SAGE provides a scalable foundation for dynamic expert routing, enabling flexible visual reasoning.

Shape-Adapting Gated Experts: Dynamic Expert Routing for Colonoscopic Lesion Segmentation

TL;DR

Shape-Adapting Gated Experts (SAGE) addresses heterogeneity in histopathology segmentation by turning static CNN-Transformer backbones into dynamic dual-path networks. It introduces a hierarchical Top-K expert routing and the Shape-Adapting Hub to bridge CNN and Transformer representations, enabling input-dependent computation and adaptive local-global refinement. Evaluations on EBHI, DigestPath, and GlaS show state-of-the-art Dice scores and strong cross-domain generalization, validating the effectiveness of dynamic routing and cross-architecture fusion. These contributions offer a scalable framework for adaptive visual reasoning in medical imaging and beyond.

Abstract

The substantial diversity in cell scale and form remains a primary challenge in computer-aided cancer detection on gigapixel Whole Slide Images (WSIs), attributable to cellular heterogeneity. Existing CNN-Transformer hybrids rely on static computation graphs with fixed routing, which consequently causes redundant computation and limits their adaptability to input variability. We propose Shape-Adapting Gated Experts (SAGE), an input-adaptive framework that enables dynamic expert routing in heterogeneous visual networks. SAGE reconfigures static backbones into dynamically routed expert architectures. SAGE's dual-path design features a backbone stream that preserves representation and selectively activates an expert path through hierarchical gating. This gating mechanism operates at multiple hierarchical levels, performing a two-level, hierarchical selection between shared and specialized experts to modulate model logits for Top-K activation. Our Shape-Adapting Hub (SA-Hub) harmonizes structural and semantic representations across the CNN and the Transformer module, effectively bridging diverse modules. Embodied as SAGE-UNet, our model achieves superior segmentation on three medical benchmarks: EBHI, DigestPath, and GlaS, yielding state-of-the-art Dice Scores of 95.57%, 95.16%, and 94.17%, respectively, and robustly generalizes across domains by adaptively balancing local refinement and global context. SAGE provides a scalable foundation for dynamic expert routing, enabling flexible visual reasoning.

Paper Structure

This paper contains 20 sections, 16 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Explainability visualization of dynamic expert routing in SAGE on the EBHI dataset. Grad-CAMs highlight contributions from the CNN and Transformer main paths and their expert blocks. SAGE adaptively redistributes attention across heterogeneous modules, revealing interpretable expert collaboration during inference.
  • Figure 2: The proposed architecture integrates a ConvNeXt ConvNext and a ViT dosovitskiy2021imageworth16x16words encoder through hierarchical expert routing and shape-adaptive interaction. The principal road (black arrows) represents the backbone forward flow, and the expert road (brown arrows) is dynamically responsible for routing information between distant modules. The router determines the best connections to experts, and then the Shape-Adapting Hub ensures that there is structural compatibility between the convolutional features and transformer features. In this illustration, the router activates two experts, and the Shape-Adapting Hub preserves compatibility between convolutional and transformer features. The skip connections enable the transfer of multi-scale features to the decoder, facilitating higher-resolution reconstruction. The two-path architecture enables SAGE-UNet to perform adaptive and complexity-conscious segmentation tasks.
  • Figure 3: The router dynamically divides computing between shared experts (green), which capture domain-invariant and transferable features, and specialized experts (blue), who focus on input-specific reasoning. High gating values prefer shared experts for generalization, whereas low values prefer specialized experts for context-dependent adaptation. This hierarchical routing allows SAGE to balance generalization and specialization across several visual domains.
  • Figure 4: Qualitative comparison on GlaS test samples. Each column shows (a) the input image with ground-truth annotation, (b) TransUNet, (c) EViT-UNet, and (d) our proposed SAGE-UNet. The top row presents a typical gland structure (GlaS Test A), while the bottom row depicts a challenging case with irregular morphology (GlaS Test B). Green areas denote correct predictions, and red areas denote errors.
  • Figure 5: Normalized affinity score heatmap: This visualization illustrates the normalized affinity scores (gating probabilities per expert per layer), with a color scale ranging from red (low affinity) to dark green (high affinity). Each row corresponds to a layer in the model (CNN $1-4$ followed by Transformer $1-16$), and each column represents one of the 20 experts ($E0$ to $E19$).
  • ...and 5 more figures