Table of Contents
Fetching ...

Logarithmic Width Suffices for Robust Memorization

Amitsour Egosi, Gilad Yehudai, Ohad Shamir

TL;DR

"Logarithmic Width Suffices for Robust Memorization" analyzes how wide a feedforward ReLU network must be to robustly memorize N labeled samples under adversarial perturbations of radius σ. By introducing δ-separated data and developing a robust variant of the Johnson–Lindenstrauss lemma, the paper proves near-tight bounds that relate σ/δ to the network width k, showing that when k < d a logarithmic-in-N width is necessary and sufficient for σ-independence of N, while larger widths allow robustness up to a threshold set by p-norm geometry. A key concept is the preserving linear map that reduces dimension while maintaining neighborhood separation; the results characterize when such maps exist and how they enable robust memorization, including explicit constants a_{p,d} and b_{p,d}. Overall, the work reveals a fundamental trade-off between width and robustness and shows that robust memorization cannot be achieved with constant width as N grows unless σ shrinks with N, contrasting with classic non-robust memorization results."

Abstract

The memorization capacity of neural networks with a given architecture has been thoroughly studied in many works. Specifically, it is well-known that memorizing $N$ samples can be done using a network of constant width, independent of $N$. However, the required constructions are often quite delicate. In this paper, we consider the natural question of how well feedforward ReLU neural networks can memorize robustly, namely while being able to withstand adversarial perturbations of a given radius. We establish both upper and lower bounds on the possible radius for general $l_p$ norms, implying (among other things) that width logarithmic in the number of input samples is necessary and sufficient to achieve robust memorization (with robustness radius independent of $N$).

Logarithmic Width Suffices for Robust Memorization

TL;DR

"Logarithmic Width Suffices for Robust Memorization" analyzes how wide a feedforward ReLU network must be to robustly memorize N labeled samples under adversarial perturbations of radius σ. By introducing δ-separated data and developing a robust variant of the Johnson–Lindenstrauss lemma, the paper proves near-tight bounds that relate σ/δ to the network width k, showing that when k < d a logarithmic-in-N width is necessary and sufficient for σ-independence of N, while larger widths allow robustness up to a threshold set by p-norm geometry. A key concept is the preserving linear map that reduces dimension while maintaining neighborhood separation; the results characterize when such maps exist and how they enable robust memorization, including explicit constants a_{p,d} and b_{p,d}. Overall, the work reveals a fundamental trade-off between width and robustness and shows that robust memorization cannot be achieved with constant width as N grows unless σ shrinks with N, contrasting with classic non-robust memorization results."

Abstract

The memorization capacity of neural networks with a given architecture has been thoroughly studied in many works. Specifically, it is well-known that memorizing samples can be done using a network of constant width, independent of . However, the required constructions are often quite delicate. In this paper, we consider the natural question of how well feedforward ReLU neural networks can memorize robustly, namely while being able to withstand adversarial perturbations of a given radius. We establish both upper and lower bounds on the possible radius for general norms, implying (among other things) that width logarithmic in the number of input samples is necessary and sufficient to achieve robust memorization (with robustness radius independent of ).

Paper Structure

This paper contains 43 sections, 62 theorems, 113 equations, 11 figures.

Key Result

Theorem 4.2

If $d+6 \leq k$ and $\frac{\sigma}{\delta} < \frac{1}{2c^{+}_{p}(d)}$, then for every $\delta$-separated dataset $\mathcal{D}\in\mathcal{D}_{d,N,C}(\delta)$, there exists a neural network $f:\mathbb{R}^d\rightarrow \mathbb{R}$ with width $k$ and depth $O\left(Nd\log_{2}\left(\frac{d}{1-\frac{2c^{+}_

Figures (11)

  • Figure 1: Illustration of main results describing regions where robust memorization is possible (green), not possible (red) and unknown (gray stripes). $k$ is the width, $\sigma$ the radius of robustness and $\delta$ the separation distance of the dataset of dimension $d$. Remark \ref{['remark:valid range for radius']} and Theorems \ref{['thm:upper bound memorization big k']}, \ref{['thm:upper bound memorization']}, \ref{['thm:lower bound memorization']} are indicated in the regions that they discuss.
  • Figure 2: A dataset that cannot be $(\sigma, k)$-preserved.
  • Figure 3: The dataset $\mathcal{D}$ before (left) and after (right) applying $T$. Distance between the data points $x, x^{\prime}$ is preserved but the images of their $\sigma$-neighborhoods intersect.
  • Figure 4: The architecture of $A_{i}$
  • Figure 5: The architecture of $F_{k,\tau,r}$
  • ...and 6 more figures

Theorems & Definitions (125)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Remark 4.1: Robustness parameter $\frac{\sigma}{\delta}$ cannot exceed $\frac{1}{2c_p^+(d)}$
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Corollary 4.6
  • Remark 4.7: Fixed ratio $k/d$
  • ...and 115 more