Logistic-Gated Operators Enable Auditable Unit-Aware Thresholds in Symbolic Regression
Ou Deng, Ruichen Cong, Jianting Xu, Shoji Nishimura, Atsushi Ogihara, Qun Jin
TL;DR
The paper tackles the gap between symbolic regression and the need for auditable, unit-aware decision boundaries in health contexts. It proposes logistic-gated operators (LGO) as differentiable gates with learnable location $b$ and steepness $a$, embedded as typed primitives and inverted back to physical units for audit. Across ICU and NHANES data, hard gates yield clinically plausible, sparse thresholds (e.g., lactate, MAP, SBP, HDL, waist) while maintaining competitive predictive accuracy, and gates are pruned on predominantly smooth tasks, enabling parsimonious, governance-ready models. Thresholds are explicitly mapped to natural units, enabling direct audit against guidelines and facilitating deployment as standalone or safety-overlay components for clinical decision support and beyond. The approach closes the interpretability gap by turning thresholds into first-class, auditable model levers, providing a practical calculus for regime switching, auditability, and governance in real-world health AI systems, with extensions to other scientific domains. $LGO_{soft}(x;a,b)=x\sigma(a(x-b))$ and $LGO_{hard}(x;a,b)=\sigma(a(x-b))$ illustrate the gate forms, while $\hat{b}_{raw}=\mu_x+\sigma_x\hat{b}$ enables unit-aware threshold interpretation.$
Abstract
Symbolic regression promises readable equations but struggles to encode unit-aware thresholds and conditional logic. We propose logistic-gated operators (LGO) -- differentiable gates with learnable location and steepness -- embedded as typed primitives and mapped back to physical units for audit. Across two primary health datasets (ICU, NHANES), the hard-gate variant recovers clinically plausible cut-points: 71% (5/7) of assessed thresholds fall within 10% of guideline anchors and 100% within 20%, while using far fewer gates than the soft variant (ICU median 4.0 vs 10.0; NHANES 5.0 vs 12.5), and remaining within the competitive accuracy envelope of strong SR baselines. On predominantly smooth tasks, gates are pruned, preserving parsimony. The result is compact symbolic equations with explicit, unit-aware thresholds that can be audited against clinical anchors -- turning interpretability from a post-hoc explanation into a modeling constraint and equipping symbolic regression with a practical calculus for regime switching and governance-ready deployment.
