Implicit Hypersurface Approximation Capacity in Deep ReLU Networks
Jonatan Vallin, Karl Larsson, Mats G. Larson
TL;DR
This work develops a constructive geometric theory showing that deep ReLU networks with width $d{+}1$ can implicitly approximate a $d$-dimensional hypersurface in $\mathbb{R}^{d{+}1}$ as the zero contour, with precise bounds linking discretization $\delta$, domain radius $R$, and ambient dimension $d$. The authors exploit a geometric interpretation of ReLU layers as projections onto polyhedral cones, introducing a modified architecture that sequences projections onto shrinking polytopes to map the graph of a $C^2$ function $\phi$ into an $\varepsilon$-band of the graph, and eventually onto a single hyperplane to yield the boundary. The main contributions include an explicit depth bound $N \lesssim Cd\left(\frac{32R}{\delta}\right)^{\frac{d+1}{2}}$, an explicit tolerance $\varepsilon \lesssim (d-1)R^{3/2}\delta^{1/2}$, and a constructive construction that yields a continuous piecewise-linear boundary $\hat{\phi}$ closely approximating $\phi$ on $B_R^d$, with applications to binary classification via embedding in $B^d_R \times \mathbb{R}$. The results provide theoretical insight into the capacity of fixed-width deep ReLU networks to represent complex hypersurfaces and decision boundaries, and they illuminate how layer-wise projections shape boundary geometry, offering a principled path for boundary-aware network design.
Abstract
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations. Given a $d$-dimensional hypersurface in $\mathbb{R}^{d+1}$ represented as the graph of a $C^2$-function $φ$, we show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour with a precision bound depending on the number of layers. This result is directly applicable to the binary classification setting where the sign of the network is trained as a classifier, with the network's zero contour as a decision boundary. Our proof is constructive and relies on the geometrical structure of ReLU layers provided in [doi:10.48550/arXiv.2310.03482]. Inspired by this geometrical description, we define a new equivalent network architecture that is easier to interpret geometrically, where the action of each hidden layer is a projection onto a polyhedral cone derived from the layer's parameters. By repeatedly adding such layers, with parameters chosen such that we project small parts of the graph of $φ$ from the outside in, we, in a controlled way, construct a network that implicitly approximates the graph over a ball of radius $R$. The accuracy of this construction is controlled by a discretization parameter $δ$ and we show that the tolerance in the resulting error bound scales as $(d-1)R^{3/2}δ^{1/2}$ and the required number of layers is of order $d\big(\frac{32R}δ\big)^{\frac{d+1}{2}}$.
