Logarithmic Width Suffices for Robust Memorization
Amitsour Egosi, Gilad Yehudai, Ohad Shamir
TL;DR
"Logarithmic Width Suffices for Robust Memorization" analyzes how wide a feedforward ReLU network must be to robustly memorize N labeled samples under adversarial perturbations of radius σ. By introducing δ-separated data and developing a robust variant of the Johnson–Lindenstrauss lemma, the paper proves near-tight bounds that relate σ/δ to the network width k, showing that when k < d a logarithmic-in-N width is necessary and sufficient for σ-independence of N, while larger widths allow robustness up to a threshold set by p-norm geometry. A key concept is the preserving linear map that reduces dimension while maintaining neighborhood separation; the results characterize when such maps exist and how they enable robust memorization, including explicit constants a_{p,d} and b_{p,d}. Overall, the work reveals a fundamental trade-off between width and robustness and shows that robust memorization cannot be achieved with constant width as N grows unless σ shrinks with N, contrasting with classic non-robust memorization results."
Abstract
The memorization capacity of neural networks with a given architecture has been thoroughly studied in many works. Specifically, it is well-known that memorizing $N$ samples can be done using a network of constant width, independent of $N$. However, the required constructions are often quite delicate. In this paper, we consider the natural question of how well feedforward ReLU neural networks can memorize robustly, namely while being able to withstand adversarial perturbations of a given radius. We establish both upper and lower bounds on the possible radius for general $l_p$ norms, implying (among other things) that width logarithmic in the number of input samples is necessary and sufficient to achieve robust memorization (with robustness radius independent of $N$).
