Table of Contents
Fetching ...

CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution

Elad Yoshai, Ariel D. Yoshai, Natan T. Shaked

Abstract

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.

CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution

Abstract

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.
Paper Structure (31 sections, 15 equations, 6 figures, 8 tables, 2 algorithms)

This paper contains 31 sections, 15 equations, 6 figures, 8 tables, 2 algorithms.

Figures (6)

  • Figure 1: CAFlow architecture overview. (a) The pipeline operates entirely in pixel-unshuffled rearranged space at $\frac{H}{4}{\times}\frac{W}{4}$ resolution, reducing spatial compute by $16{\times}$. (b) FlowResNet backbone (1.90M parameters): 16 blocks grouped into 4 exit segments with increasing hybrid (attention) block density, creating a non-linear cost gradient ($4.3{\times}$ ratio). The ExitClassifier (${\sim}$6K parameters) predicts the optimal exit from E0 features. (c) Internal structure of FiLMResBlock (Eq. \ref{['eq:filmresblock']}) and HybridFiLMBlock (Eq. \ref{['eq:hybridblock']}). Dashed arrows denote residual connections; HybridFiLMBlock augments the convolutional path with W-MSA/SW-MSA ($w{=}8$, 8 heads) and a feed-forward MLP.
  • Figure 2: Compute-quality Pareto frontier. CAFlow at different operating points (Exit 0 through Exit 3 and Adaptive) compared against fixed baselines including SwinIR-Medium. The adaptive strategy achieves near-full-depth quality at reduced compute. SwinIR-Medium reaches similar PSNR to CAFlow Exit 3 only at substantially higher compute, while the CAFlow frontier remains stronger in the practical 3.1 to 13.3 GFLOPs regime.
  • Figure 3: Exit distribution across the 343 validation images under adaptive routing. The classifier routes a mix of images to each exit, with many easy images handled at E0/E1 and harder images pushed to E2/E3.
  • Figure 4: Spatial exit assignment on a held-out TCGA-BRCA frozen tissue slide (TCGA-A8-A08C, not used during training) tiled at 40$\times$ into 1024$\times$1024 patches. After tissue detection (Otsu thresholding), 301 tissue tiles are retained and colored by their router-assigned exit: blue (E1, 6.07 GFLOPs), orange (E2, 9.39 GFLOPs), or red (E3, 13.34 GFLOPs). The router assigns 33% to E1 and 61% to E2, with only 6% routed to E3, averaging 8.55 GFLOPs (a 36% reduction versus full-depth processing).
  • Figure 5: Qualitative comparison on $\times$4 histopathology SR across breast, kidney, and lung tissue. All examples are taken from the deterministic validation split, using organ-specific patches with highlighted ROIs. Colored boxes mark structures where CAFlow preserves clearer or competitive nuclear separation, lumen definition, or alveolar/stromal continuity than competing methods. Across the highlighted ROIs, CAFlow matches SwinIR-Medium in mean PSNR within 0.03 dB (25.61 vs. 25.63 dB) while achieving higher mean SSIM (0.760 vs. 0.751). Although SwinIR-Medium and SR3 can be visually competitive, they are roughly one to two orders of magnitude heavier in compute and memory.
  • ...and 1 more figures