The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

Abhigyan Dutta; Itay Safran; Paul Valiant

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

Abhigyan Dutta, Itay Safran, Paul Valiant

TL;DR

A constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube is presented.

Abstract

We study the approximation of the median of $d$ inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube. By further establishing a general reduction from the maximum to the median, our results break a barrier suggested by prior work on the maximum function, which indicated that linear width should require depth growing at least as $\log\log d$ to achieve comparable accuracy. Our construction relies on a multi-stage procedure that iteratively eliminates non-central elements while preserving a candidate set around the median. We overcome obstacles that do not arise for the maximum to yield approximation results that are strictly stronger than those previously known for the maximum itself.

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

TL;DR

A constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube is presented.

Abstract

We study the approximation of the median of

inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube. By further establishing a general reduction from the maximum to the median, our results break a barrier suggested by prior work on the maximum function, which indicated that linear width should require depth growing at least as

to achieve comparable accuracy. Our construction relies on a multi-stage procedure that iteratively eliminates non-central elements while preserving a candidate set around the median. We overcome obstacles that do not arise for the maximum to yield approximation results that are strictly stronger than those previously known for the maximum itself.

Paper Structure (50 sections, 35 theorems, 85 equations, 8 algorithms)

This paper contains 50 sections, 35 theorems, 85 equations, 8 algorithms.

Introduction
$L_2$ approximation of the maximum.
Exact computation of CPWL functions.
Preliminaries and notation
Notations.
Neural networks.
Approximation error.
Neural network approximation for the median function
Depth $3$ and width $\mathcal{O}(d^2)$ median computation
Depth $5$ and width roughly $\mathcal{O}(d^{5/3})$ median computation
Depth $\mathcal{O}(1)$ and width $\mathcal{O}(d)$ median computation
Proof sketch of Theorem \ref{['thm:linear_width_med_computation_distribution_zeroone']}---specialized for the maximum
Sparsification step.
Hashing step.
Lower bounds for computing the median
...and 35 more sections

Key Result

Theorem 3.1

For any dimension $d$ and any target accuracy $\epsilon>0$, there exists a ReLU neural network $\mathcal{N}$ of depth $3$ and width $\mathcal{O} \left(d^{2}\right)$, and magnitude of weights bounded by $\frac{12d^4}{\epsilon}$, such that

Theorems & Definitions (83)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Remark 3.4: Random selection within ReLU neural networks
Theorem 4.1: safran2026depth
Theorem 4.2
Theorem 4.3
Theorem 4.5: safran2024max
Corollary 4.6
Theorem 4.7: safran2024max
...and 73 more

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

TL;DR

Abstract

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (83)