Rational ANOVA Networks
Jusheng Zhang, Ningyuan Liu, Qinhan Lyu, Jing Yang, Keze Wang
TL;DR
Rational-ANOVA Networks (RAN) fuse a functional-ANOVA decomposition with Padé-style rational units to model a target function as a sum of main effects and sparse pairwise interactions. By enforcing strictly positive denominators and employing residual gating, RAN achieves stable deep optimization and improved extrapolation compared with fixed activations and spline-based alternatives. The architecture serves as a drop-in replacement for FFNs in models like Vision Transformers, yielding better accuracy-efficiency under matched budgets and enabling explicit control over interaction topology. Across visual benchmarks, large-scale ViT integrations, and real-world denoising, RAN demonstrates consistent gains and robustness, while ablations and theory explain its stability and the benefits of smart sparse connectivity. The work also highlights RAN’s potential for automated symbolic discovery and interpretable rational dynamics in scientific modeling tasks.
Abstract
Deep neural networks typically treat nonlinearities as fixed primitives (e.g., ReLU), limiting both interpretability and the granularity of control over the induced function class. While recent additive models (like KANs) attempt to address this using splines, they often suffer from computational inefficiency and boundary instability. We propose the Rational-ANOVA Network (RAN), a foundational architecture grounded in functional ANOVA decomposition and Padé-style rational approximation. RAN models f(x) as a composition of main effects and sparse pairwise interactions, where each component is parameterized by a stable, learnable rational unit. Crucially, we enforce a strictly positive denominator, which avoids poles and numerical instability while capturing sharp transitions and near-singular behaviors more efficiently than polynomial bases. This ANOVA structure provides an explicit low-order interaction bias for data efficiency and interpretability, while the rational parameterization significantly improves extrapolation. Across controlled function benchmarks and vision classification tasks (e.g., CIFAR-10) under matched parameter and compute budgets, RAN matches or surpasses parameter-matched MLPs and learnable-activation baselines, with better stability and throughput. Code is available at https://github.com/jushengzhang/Rational-ANOVA-Networks.git.
