Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection
Bharadwaj Dogga, Kaaustaaub Shankar, Gibin Raju, Wilhelm Louw, Kelly Cohen
TL;DR
The paper tackles the explainability gap in edge detection by proposing the sMoE U‑Net, a hybrid architecture that integrates Spatially-Adaptive Mixture-of-Experts blocks with a differentiable First-Order Takagi-Sugeno-Kang (TSK) fuzzy head. This design enables per-pixel gating between context-aware smoothing and boundary-preserving sharpening, and provides explicit IF-THEN rules for decisions, visualized via Strategy Maps and Rule Firing Maps. On BSDS500, the model achieves an ODS F-score of 0.7628, competitive with HED and superior to standard U‑Net, while offering interpretability without sacrificing accuracy. This Glass-Box approach holds promise for safety-critical applications like medical imaging and aerospace, where verifiability and auditable decisions are essential.
Abstract
Deep learning models like U-Net and its variants, have established state-of-the-art performance in edge detection tasks and are used by Generative AI services world-wide for their image generation models. However, their decision-making processes remain opaque, operating as "black boxes" that obscure the rationale behind specific boundary predictions. This lack of transparency is a critical barrier in safety-critical applications where verification is mandatory. To bridge the gap between high-performance deep learning and interpretable logic, we propose the Rule-Based Spatial Mixture-of-Experts U-Net (sMoE U-Net). Our architecture introduces two key innovations: (1) Spatially-Adaptive Mixture-of-Experts (sMoE) blocks integrated into the decoder skip connections, which dynamically gate between "Context" (smooth) and "Boundary" (sharp) experts based on local feature statistics; and (2) a Takagi-Sugeno-Kang (TSK) Fuzzy Head that replaces the standard classification layer. This fuzzy head fuses deep semantic features with heuristic edge signals using explicit IF-THEN rules. We evaluate our method on the BSDS500 benchmark, achieving an Optimal Dataset Scale (ODS) F-score of 0.7628, effectively matching purely deep baselines like HED (0.7688) while outperforming the standard U-Net (0.7437). Crucially, our model provides pixel-level explainability through "Rule Firing Maps" and "Strategy Maps," allowing users to visualize whether an edge was detected due to strong gradients, high semantic confidence, or specific logical rule combinations.
