Table of Contents
Fetching ...

Simplex-FEM Networks (SiFEN): Learning A Triangulated Function Approximator

Chaymae Yahyati, Ismail Lamaakal, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh

TL;DR

SiFEN introduces a learned simplexes-based predictor that represents $f:\,\\mathbb{R}^d o \R^k$ as a globally $C^r$ finite-element field on a learned simplicial mesh, optionally warped. At inference, only one simplex is active and at most $d+1$ Bernstein–Bezier basis functions are touched, yielding explicit locality, smoothness control, and cache-friendly computation. The mesh, warp, and polynomial coefficients are trained end-to-end with shape regularization, semi-discrete OT coverage, and differentiable topology updates, supporting theoretical FEM-like error rates of $M^{-m/d}$ under standard assumptions. Empirically, SiFEN matches or surpasses MLPs and KANs on synthetic, tabular, and CNN-head benchmarks, improves calibration (lower NLL/Brier and ECE), and reduces inference latency due to locality. The work presents a coherent framework combining geometry, approximation theory, and practical training tricks to deliver a compact, interpretable alternative to dense predictors with strong performance and robustness.

Abstract

We introduce Simplex-FEM Networks (SiFEN), a learned piecewise-polynomial predictor that represents f: R^d -> R^k as a globally C^r finite-element field on a learned simplicial mesh in an optionally warped input space. Each query activates exactly one simplex and at most d+1 basis functions via barycentric coordinates, yielding explicit locality, controllable smoothness, and cache-friendly sparsity. SiFEN pairs degree-m Bernstein-Bezier polynomials with a light invertible warp and trains end-to-end with shape regularization, semi-discrete OT coverage, and differentiable edge flips. Under standard shape-regularity and bi-Lipschitz warp assumptions, SiFEN achieves the classic FEM approximation rate M^(-m/d) with M mesh vertices. Empirically, on synthetic approximation tasks, tabular regression/classification, and as a drop-in head on compact CNNs, SiFEN matches or surpasses MLPs and KANs at matched parameter budgets, improves calibration (lower ECE/Brier), and reduces inference latency due to geometric locality. These properties make SiFEN a compact, interpretable, and theoretically grounded alternative to dense MLPs and edge-spline networks.

Simplex-FEM Networks (SiFEN): Learning A Triangulated Function Approximator

TL;DR

SiFEN introduces a learned simplexes-based predictor that represents as a globally finite-element field on a learned simplicial mesh, optionally warped. At inference, only one simplex is active and at most Bernstein–Bezier basis functions are touched, yielding explicit locality, smoothness control, and cache-friendly computation. The mesh, warp, and polynomial coefficients are trained end-to-end with shape regularization, semi-discrete OT coverage, and differentiable topology updates, supporting theoretical FEM-like error rates of under standard assumptions. Empirically, SiFEN matches or surpasses MLPs and KANs on synthetic, tabular, and CNN-head benchmarks, improves calibration (lower NLL/Brier and ECE), and reduces inference latency due to locality. The work presents a coherent framework combining geometry, approximation theory, and practical training tricks to deliver a compact, interpretable alternative to dense predictors with strong performance and robustness.

Abstract

We introduce Simplex-FEM Networks (SiFEN), a learned piecewise-polynomial predictor that represents f: R^d -> R^k as a globally C^r finite-element field on a learned simplicial mesh in an optionally warped input space. Each query activates exactly one simplex and at most d+1 basis functions via barycentric coordinates, yielding explicit locality, controllable smoothness, and cache-friendly sparsity. SiFEN pairs degree-m Bernstein-Bezier polynomials with a light invertible warp and trains end-to-end with shape regularization, semi-discrete OT coverage, and differentiable edge flips. Under standard shape-regularity and bi-Lipschitz warp assumptions, SiFEN achieves the classic FEM approximation rate M^(-m/d) with M mesh vertices. Empirically, on synthetic approximation tasks, tabular regression/classification, and as a drop-in head on compact CNNs, SiFEN matches or surpasses MLPs and KANs at matched parameter budgets, improves calibration (lower ECE/Brier), and reduces inference latency due to geometric locality. These properties make SiFEN a compact, interpretable, and theoretically grounded alternative to dense MLPs and edge-spline networks.

Paper Structure

This paper contains 123 sections, 55 equations, 15 figures, 24 tables, 1 algorithm.

Figures (15)

  • Figure 1: Representative results. SiFEN achieves lower error and better calibration at comparable or lower inference time than MLP/KAN/DLattice/MASN across tasks.
  • Figure 2: Scaling on piecewise-smooth target ($d{=}2$). SiFEN’s slope approaches $M^{-m/d}$ as predicted.
  • Figure 3: Side-by-side comparison.MLP: fixed activations at nodes, learnable weights on edges. KAN: learnable 1D splines on edges with summation at nodes. SiFEN: warped input lies in a single active simplex; evaluation uses $d{+}1$ barycentric basis in a local Bernstein--Bézier polynomial.
  • Figure 4: Risk–coverage behavior induced by thresholding barycentric energy $E(x)$. We use this figure when interpreting selective prediction alongside \ref{['app:eval:shift']}.
  • Figure 5: Warp-adapted rates. We observe a lower intercept (smaller $\mathcal{K}_{3}$) after warping at the same slope $m{+}1$, consistent with equation \ref{['eq:bh-warp']}.
  • ...and 10 more figures